This algorithm is a relatively fast algorithm for target retrieval, referring to randomized Visual phrases for object search.
The flow of the algorithm is as follows:
Picture Training Stage
1 Read n pictures
2 Convert to Grayscale
3 Detecting the feature points of n images, the SIFT feature is applied in this algorithm.
4 generating descriptors
5 describe sub-clusters, generate visual word, complete with the Kmeans () algorithm.
6 describe a picture in a picture library in a different way
All the feature points in each picture are represented by [x,y,v]. where x, y is the coordinates of each feature point, V represents the number of the Visualword that corresponds to the feature point.
7 Generating Invert file
8 Stop list algorithm
In the statistics inverted file, for a word, including the number of images of this word, the high-frequency occurrence of word deleted, this step is to improve the accuracy of the retrieval of a lot of help.
Target Search phase
9 reading into the target picture
10 Convert to Grayscale
11 extracting feature points, generating descriptors
12 visual phrase for target generation
Describe what visual pharse is, which is to replace all the feature points that appear in the target picture with the closest word in the visual word vocabulary, and then calculate the number of occurrences of all word. In this algorithm, we match the Euclidean distance of each visual word according to the description sub-vector of the target feature point. This method is relatively slow, and there are some fast matching algorithms that are currently being studied and will be updated later.
13 search contains images with target
Retrieves the image closest to the target, based on the inverted file. Find the picture that best matches the target by calculating the visual phrase vector of the target picture that intersects the histogram of the picture's visual phrase vector (hi or NHI).
14 pairs of found pictures, randomly divided
The picture, randomly divided into a piece of rectangular squares, divided t-times, each partition is not overlapping. The number of rows per partition is the same as the number of columns, all of which are m*n. In the algorithm of this paper, we can adjust the speed of the T and adjust the precision by adjusting the size. If the choice of T is relatively large, then its precision is larger, the speed is relatively slow. If T is selected relatively small, then its positioning accuracy is relatively small, faster.
15 calculating visual phrase for each random block
16 calculating the matching degree of each random mesh to the target
Calculates the hi distance between the visual phrase of each grid and the target's visualphrase, the greater the distance, the more the target matches the content in the grid.
17 votes
After the random division of the T, a random picture block will be generated, and each block and the target, each calculated a distance, the distance as the score of this block. m*n*t Then each block, put this score on every pixel it contains. This will give you a vote chart with the highest score, which represents the highest level of proximity to the target.
18 look for the point a with the largest value in the projection graph.
19 by the size of the projected pixel value, a matrix region is determined around a. The rectangular area is considered to be the position of the target in the original image.
20 Displays the rectangular area.
The experimental results are as follows:
Target to retrieve
Retrieves the result.
This article is for CSDN blog, more content please point: http://write.blog.csdn.net/postlist/2867665/all
Target retrieval based on randomized visual phrase