selective Search for Object recoginition Surgewong@gmail.com
Http://blog.csdn.net/surgewong
In the previous period of reading papers related to the work, there is no time to collate the understanding of this article. In the previous blog "1" mentioned selective Search "2", its early work using image segmentation method to get some of the original area (see "1"), and then use some merge strategies to merge these areas, to get a hierarchical regional structure, And these structures contain objects that may be needed. Blog "3" has some simple introduction to this paper, writing this blog can not help but repeat the invention of the wheel, do not think too much, just want to write down some of their understanding, deepen their understanding.
This paper is an article published by J.R.R Uijlings on IJCV, which mainly introduces the methods of selective searching (selective search). Object recognition (object recognition), found in the image to identify an object, and find its specific location, after a long period of development has been a lot of achievements. The previous approach is based on exhaustive search (exhaustive), select a window (Windows) to scan the entire image, change the size of the window, continue to scan the entire image. Obviously this approach is relatively "original", changing the window size, scanning the entire image, intuitively gives a very time-consuming, the result too miscellaneous impression. The author can break the pattern of thinking and give a simple and effective method from another point of view, the oil-born respect. We can't help wondering why such a simple method had not been thought of before. I think this should have something to do with the idea of image recognition, when you do not know how to do object recognition (object recognition), the more "original" exhaustive search method, gave everyone a direction, after all people follow this direction, and finally ignored the other direction of understanding. It took so many years to find another direction, and the change was not easy. Pulled away, in short, this method is really refreshing.
I. Introduction (INTRODUCTION)Images (image) contain very rich information, objects (object) have different shapes (shape), dimensions (scale), colors (color), textures (texture), to identify an object from the image is very difficult, but also to find the position of the object in the image , which makes it even more difficult. The following figure gives four examples to illustrate the complexity and difficulty of object recognition (object recognition). (a) The scene is a table with bowls, bottles, and other cutlery on the table. For example, to identify the table, we may just refer to the table itself, or other objects above it. This shows that there is a certain level of relationship between different objects in the image. (b) Two cats are given, which can be found by textures (texture) to find the two cats, but they need to be differentiated by color. (c) The chameleon and the surrounding color are close and can be distinguished by textures (texture). (d) In vehicles, it is easy to think of the body and wheel as a whole, but they are very different in texture (texture) and colour (color).
It simply shows that in the process of object recognition (object recognition), it is not possible to distinguish different objects by a single strategy, and the diversity of image objects (diversity) needs to be fully considered. In addition, in the image of the layout of the object has a certain level (hierarchical) relationship, considering this relationship can better distinguish the class of objects (category). Before delving into selective search, let's talk about some of the issues that need to be considered: 1. Adapt to different scales (Capture all scales): Exhaustive search (exhaustive selective) by changing the size of the window to adapt to different scales of the object, select Search (selective) also can not avoid this problem. The algorithm uses image segmentation (segmentation) and a hierarchical algorithm (hierarchical algorithm) to solve the problem effectively. 2. Diversity (diversification): A single strategy cannot cope with multiple categories of images. Merges (region) regions ("1") with a variety of policies (color), texture (texture), size, and so on. 3. Fast (Fast to Compute): algorithm, just like Kung Fu, only fast and not broken.
Ii. Region Merging algorithmThis is based on the merging of regions, which contain more information than pixels and are more effective in representing the characteristics of an object. About the area used for objects to know the method, please refer to the paper "4", here no longer said, later have empty words, write something in the blog. First, the original area of the acquisition method, you can view the blog "1" and related papers. The method of merging the region is hierarchical (hierarchical), similar to the construction process of Huffman tree.
Input:Color picture (three channels)
Output:Possible results of the position of the object L
1. Use the efficient graph-based Image Segmentation "1" method to obtain the original partition area R={R1,R2,..., RN} 2. Initialize similarity set s=∅3. Calculate 22
adjacent AreaThe similarity between (see part III), add it to the similarity set S in 4. To find out from the set S of similarity degree,
The most similar degreeof two areas Ri and RJ, merging it into a regional RT, the similarity between the original and RI and RJ adjacent regions is removed from the similarity set, and the similarities between RT and its adjacent area (formerly the area adjacent to RI or RJ) are computed, and the result is added to the similarity set S. Add a new Area RT to the Zone collection R at the same time. 5. Obtain the bounding Boxes of each area, this result is the possible result of the position of the object L
My supplementary:①step1~step4 is a loop, and the end condition is set S is empty. Set S is the similarity of adjacent regions, initially the similarity of all neighboring original regions: S (R1,R2), S (R2,R3) ... When two regions are merged, it erases all the similarities in s that are related to the two regions, then calculates the similarity between the two regions after merging the regions with the surrounding regions, adding s, so that s is missing one element (the similarity between the two regions is gone). As the loop goes on, s will become less and fewer until all the regions are merged into one area. ② In addition, R is more and more and will not be reduced because either the original region or any region that is created by the merge is added.
Iii. Diversification StrategiesThe author gives two strategies for diversification: color space diversification, similar diversification.
Color Space DiversityThe authors used different color methods in 8, mainly to consider scenes and lighting conditions. This strategy is mainly applied to the generation of the original region in the image segmentation algorithm in "1". The main color space used are: (1) RGB, (2) Grayscale I, (3) Lab, (4) RgI (normalized RG Channel plus grayscale), (5) HSV, (6) RGB (normalized RGB), (7) C (see the paper "2" and "5"), (8) H (HSV H channel)
Not deep understanding of the color space, in this inconvenience in-depth description, waiting to slowly penetrate the field of computer vision.
diversity of Similarity computingAt the time of the regional merging, there is a discussion about the similarity between the calculated regions, and the paper introduces four kinds of similarity calculation methods. 1. Colour (color) similarity uses l1-norm normalization to obtain the histogram of the bins of each color channel of the image, so that each region can get a 75-D vector, and the color similarity between regions is calculated by using the following formula: the need for new The region is computed by its histogram, and the calculation method:
2. Texture (texture) similarity the texture here uses sift-like features. The specific approach is to compute the variance σ=1 Gaussian differential (Gaussian derivative) for each color channel in 8 different directions, each color of each channel to obtain a bins histogram (l1-norm normalization), so that a 240-D vector can be obtained, The method of calculating texture similarity between regions is similar to that of color similarity, and the method of texture feature calculation is the same as that of the new region after merging:
3. Size similarity the size here refers to the number of pixels in the range. The use of the size of the similarity calculation, mainly to try to make small areas to merge first:
4. Match (FIT) similarity This is mainly to measure whether two regions are more "consistent", with the indicator that the smaller the bounding box (the smallest rectangle (without rotation) that can frame the area), the higher the degree of anastomosis. How it is calculated:
Finally, the similarity calculation method is combined, which can be written as follows:
using selection Search (selective) for object recognitionBy merging the previous areas, you can get the position of some of the objects in the hypothesis L. The next task is to find out the real position of the object and determine the category of the object. Commonly used object recognition features are hog (histograms of oriented gradients) and bag-of-words two characteristics. In the exhaustive search method, it takes a lot of time to find the right position, and the features that can be selected for object recognition are not too complex to use, only some of the less time-consuming features. Because the choice search (selective searches) obtains the position of the object to assume this step efficiency is high, it may use such as sift and so on computation quantity, the expression ability strong characteristic. In the classification process, the system uses the SVM.
feature GenerationThe system uses the Color-sift feature "6" and Spatial Pyramid Divsion method "7" during the implementation process. The characteristics of sampling extraction under σ=1.2 in a scale. Using SIFT, Extended opponentsift "8", Rgb-sift "6" features, in the four-layer pyramid model 1x1, 2x2, 3x3, 4x4, extracting features, you can get a dimensional eigenvector. (Note: The SIFT features and pyramid models are not very well understood, not very well spoken)
Training ProcessThe training method adopts SVM. First select the object window containing the real result (ground truth) as
Positive Sample(positive examples), select the window that overlaps 20%~50% with the positive sample window as
Negative Samples(negative examples). The negative samples that overlap 70% are eliminated in the selection process, which provides a good initialization result. adding hard negative examples (negative sample with high score) "9" in the iterative process, because the training model initialization result is good, the model only needs to iterate two times to be possible. (Selection of samples is important.) )
v. Performance EvaluationNaturally, the more the bounding boxes that is computed by the algorithm than the real (Ground Truth) window overlaps, the better the algorithm performance. This is the use of an average maximum overlap rate of abo (Average best overlap). For each fixed class C, each real case (ground Truth) is expressed as, so that the computed position assumes L of each value L, then the ABO formula is expressed as:
How the overlap rate is calculated:
The above results give a category of Abo, for all categories of performance evaluation, it is natural to use all categories of ABO average Mabo (Mean Average best overlap) to evaluate.
The above basically said, the framework of this paper, in "2" can download the paper corresponding to the MATLAB code (also can be downloaded in "10"). The code is used to obtain the position hypothesis L of the object in the image. We can then use this result for more in-depth study. Because of the original MATLAB code in some of the code has been encrypted, is using C + + to rewrite it, the results are perfect, it will be open. Because the entry is not deep, the above understanding unavoidably some mistakes, hope you correct, wish to communicate with you a lot ~ ~
Added: 2015-02-05 was affected by various chores, and for a long time did not update the blog. Now learning about machine-related knowledge, image segmentation related to the code did not have time to clean up, had to be directly packaged before the project. There are some mistakes or deficiencies, forget the vast number of friends to correct. C + + code is only related to the MATLAB code part of the project rewrite, not carefully compared to the performance between the two, but the rain to understand its principle still have a lot of help, I hope to help beginners. Code download Link "11".
References: "1" csdn:efficient graph-based Image Segmentation "2" Selective search "3" csdn:selective search for Object recognitio N "4" Recognition Using regions "5" Color invariance "6" Evaluating color descriptors for object and scene recognition "7" Sp Atial Pyramid Matching for recognizing natural scene categories "8" illumination-invariant descriptors for discrimative Vis UAL object categorization,technical, University of Amsterdam (no RELATED links found) "9" Object detection with discriminatively t Rained part based models "10" related source code (MATLAB) "11" C + + Simplified version code
This article is reproduced from: http://blog.csdn.net/mao_kun/article/details/50576003
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.