Target detection (1)-selective Search

Source: Internet
Author: User

Original address: https://zhuanlan.zhihu.com/p/27467369

A few days ago TensorFlow open Source A lot of target detection model, including faster RCNN, SSD, and so on, happens to own the paper is also the target detection network, is the time to brush before the detection Network festival out. The main things I have seen include the RCNN series based on region proposal: RCNN, Fast rcnn, Faster rcnn, YOLO based on zoning, SSD, attentionnet based on reinforcement learning, and the latest mask rcnn. We'll take a week to introduce each model in detail and then run through the TensorFlow model again. Speaking of the target detection method based on region proposal, we have to mention the selective search method used by RCNN, let's start with this article.

Article Address: http://www.huppelen.nl/publications/selectiveSearchDraft.pdf
Summary

To understand the complexity of the target detection area recommendations, let's look at a set of images:

Since we do not know which category we need to detect beforehand, the table, bottle, and cutlery of the first picture are the candidate targets, and the cutlery is included in the target of the table, and the spoon is contained within the bowl. This diagram shows the hierarchical relationship of target detection and the scale relationship, so how do we get to the location of these possible targets? The conventional method is to use the exhaustive method, that is, the original image on different scales of different sizes of sliding windows, to obtain each possible position. The disadvantage of this is also obvious, is that the calculation is too large, and because it is not possible to take into account each scale, so the target location can not be so accurate. Can we use visual features to reduce the likelihood of this classification and improve the accuracy? This is what this article wants to do.

There are many features available, and what are the features that are useful? We look at the second picture of the two cats, their textures are the same, so the texture characteristics must not be. And if through the color can be very good distinction. But the third picture Chameleon can not, at this time edge features, texture features and appear more useful. And in the last picture, it's easy to think of cars and tires as a whole, but the difference between the two is really obvious, whether it's color or texture or the edge is too far away. And this is a few cases, the natural image is spicy, we use what characteristics to distinguish. What scale should be differentiated.

Selective search strategy is, since do not know the scale is what, then we as far as possible to traverse all the scales well, but different from the violent poor lift, we can first get the small-scale area, and then merge to get the big size is good, it is also in line with human visual cognition. Since a lot of features, that we know the characteristics of the use, but also to take care of the computational complexity, otherwise, and poor lifting method is no different. The last thing to do is to be able to sort each area so that you can have as many candidates as you want, or else it will always produce so much. Well, that's the whole idea of the article, so let's go and see it a little bit. produce multi-scale regional recommendations


First, the original region is initialized by the image segmentation method, which divides the image into a lot of small pieces. We then use the greedy strategy to calculate the similarity of each of the two adjacent regions, and then merge the most similar two blocks at a time, until eventually only a single complete picture is left. And then each of these image blocks, including the merged image blocks, are preserved so that we can get the layered representation of the image. So how do we calculate the similarity of two image blocks?

strategies for Maintaining diversity

In order to do the best possible to divide the picture of all the scenes we have to maintain the diversity of characteristics Ah, the article mainly in two ways to maintain the diversity of features, on the one hand, through the color space transformation, the original color space to convert to up to eight of the color space. Then through the variety of distance calculation, the combination of color, texture and other characteristics. Color Space Transformation

So much color space, anyway I know not know all in the inside. Distance calculation method
The distance calculation method needs to meet two conditions, one speed is fast, because after all, we have so many regional suggestions there is so much diversity. The second is that the combined features are better calculated, because we merge the regions through the greedy algorithm, and if we need to recalculate the distance every time, this calculation is much larger. Color distance
We have so much effort to make so many color space, the first of course is to calculate the color of the distance AH.

The distance is calculated simply by calculating the color histogram for each channel and then taking the minimum value of each corresponding bins histogram. In doing so, the two-region merged histogram is also well computed, just by weighting the area size by the histogram size and dividing by the total area size.

2. Texture Distance

Texture distance calculation and color distance is almost the same, we calculate the fast sift characteristics of each region, where the number of directions is 8, 3 channels, each channel bins 10, for each image to obtain a 240-dimensional texture histogram, and then calculate the distance.

3. Priority consolidation of small areas

If only through the combination of color and texture features, it is easy to make the combined area of the annexation of the surrounding area, the consequence is that the multi-scale only applied to the local, rather than the global multi-scale. So we give more weight to small areas, so that it's guaranteed to be multi-scale in the image at each location.

4. Degree of suitability of the area distance

Not only to consider each regional characteristics of the degree of anastomosis, the region is also important, the meaning of the coincidence is the combined area to try to standardize, can not be combined after the emergence of the area of the cliff, so obviously not in line with common sense, reflected in the area of the external rectangular overlap area to be large. Thus the suitability distance of the area is defined as: 5. Synthesizing various distances

Now the various distances are calculated, we have to do is to integrate these distances, through a variety of strategies to get regional advice, the simplest method is of course weighted: 6. Parameter initialization diversity

We get the initial region based on the graph-based image segmentation, and this initial region has a great impact on the final effect, so we can initialize the image segmentation by various parameters, and also expand the diversity. Scoring the area

We can get lots and lots of regions through the above steps, but obviously not every region is the same as the target, so we need to measure this possibility so that we can filter the number of areas according to our needs.

This article is to give the first merged image block a larger weight, such as the last piece of the full image weight of 1, the second-to-penultimate merge region weight of 2 and so on. But when we have a lot of strategies, a lot of diversity, this weight will have too much overlap, the sort is not good to engage AH. The article is to give them times a random number, after all, 3 points to see luck, and then for the same area multiple occurrences of the weight, after all, many methods say you are the goal, there is a reason for it. This way I get the target score for all regions, and I can choose how many areas I need depending on my needs. target recognition based on selective search


Select search (selective searches) for object recognition
By merging the previous areas, you can get the position of some column objects assuming L. The next task is to find out the real position of the object and determine the category of the object.

(1) Feature generation system in the implementation process, using spatial pyramid Divsion extract features.
(2) The iterative training first selects the object window containing the real result (ground truth) as a positive sample (positive examples) and selects the window that overlaps the Positive sample window as a negative sample (20%~50% negative). Rejecting negative samples that overlap 70% with each other during the selection of the sample can provide a good initialization result (the initialized image here is the image shown by Training Examples). A negative sample with a high score was added during the iterative iteration to increase the number of difficult samples. Use it to train the model until it converges (the precision does not change).


"Select the window that overlaps the 20%~50% with the Positive sample window as a negative sample (negative examples). In the process of selecting samples, reject the negative samples that overlap each other by 70% "to understand this sentence:

(1) "Select the window overlapping 20%~50% with the positive sample window as a negative sample"--refers to the object in the hypotheses image below the position of the window is listed in the ground Truth image above the location of the window in the comparison, If the position window of the image below overlaps with the position window of the image above 20%-50%, it is retained as a negative sample.

(2) "reject the negative sample that overlaps 70% with each other during the selection of the sample"-can be seen as a re-screening of the reserved position window. The position window listed in the object hypotheses image below is compared to the position window given in the above ground truth image, and retains only one position if there is a 70% overlap between positions.


This is a typical application, we get a lot of suggestions of the region, the extraction of the spatial pyramid of the various features, combined into a eigenvector, and then training SVM can classify which area is really what we want to target. Of course, it can also be used for target detection, and the next thing we want to say is rcnn. But compared to this article sift characteristics AH what to advanced some, after all, CNN's characteristic expression ability is very strong.






Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.