Selective search for object recognize

Last Update:2016-05-13 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Selective search used in RCNN

Selective search's main tasks are like finding a specific target in a multi-objective diagram.

First of all

What is the goal in a picture, and how to distinguish the extraction

For Figure B, we can separate two cats according to their color, but they cannot be separated according to the texture.

For Figure C, we can find the chameleon according to the texture, but we can't find it according to the color.

For Figure D, we classify the wheels as part of the car, not because of the color similarity, nor because the texture is close, but because the wheel is attached to the top of the car (personal understanding is that the car "wraps" the wheel)
Therefore, we need to use a variety of strategies to combine, it is possible to find all the objects in the picture.
In addition, figure A illustrates the possible hierarchical relationship between objects , or a nested relationship-the spoon is inside the pot and the pot is on the table.

It simply shows that in the process of object recognition (object recognition), it is not possible to distinguish different objects by a single strategy, and the diversity of image objects (diversity) needs to be fully considered. In addition, in the image of the layout of the object has a certain level (hierarchical) relationship, consider this relationship can be better to distinguish the category of objects.

before you dive into selective search, let's talk about some of the issues that need to be considered:

1. Adapt to different scales (Capture all Scales): Exhaustive search (exhaustive selective) by changing the window size to fit the object's different scales, choosing to search (selective search) also cannot avoid this problem. The algorithm uses image segmentation and uses a hierarchical algorithm (hierarchical algorithm) to solve this problem effectively.

2. Diversification (diversification): A single strategy cannot handle multiple categories of images. Merge regions (region) using a variety of strategies, such as color, texture (texture), size, and so on.

3. Fast (Fast to Compute): algorithm, speed is the actual application of practical significance.

Multiscale

Because of the hierarchical relationship between objects, selective search used the multiscale idea. It is seen that selectsearch can find different objects at different scales.
Note that the different scales mentioned here are not meant to be scaled by the original image, or to change the window size, but to divide the image into a number of region by dividing it, and use the merge (grouping) method to aggregate the region into a large region, Repeat the process until the entire picture becomes one of the largest region. This process can generate the region of the Multiscale, and it also conforms to the assumption that there may be a hierarchical relationship between objects.

Selective Search Method Introduction

Use the method in efficient graphbased Image segmentation to get the region
Get 22 similarity between all the region
Merge the two region that most resembles
Recalculate the similarity of new merged region to other region
Repeat the process until the entire picture is aggregated into a large region
Use a random scoring method to score each region, ranking by fractions, and take out a subset of top K, which is the result of selective search

strategy Diversification (diversification strategies)

There are two ways to diversify, one for the color space for the sample, and the other for the strategy of calculating similarity when merging.
8 color spaces, including RGB, grayscale, Lab, and more
4 similarities were adopted: color similarity (case of figure1a), texture similarity (corresponding to figure1b case), small object merging principle, compatibility between objects (corresponding to figure1d)

How to score region?

I'm not so sure here, but according to the author's description and personal understanding, I think it's a random score.
for some sort of merge policy J , define R < Span class= "Texatom" id= "mathjax-span-8" style= "display:inline; position:static; border:0px; padding:0px; margin:0px; vertical-align:0px ">j i for position in i of theRegion, whereIrepresents the number of layers on which it is located at the time of consolidation (I=1 represents the entire picture as aRegionlevel, which is incremented), then the score is defined as,where RND is a random value between [0, 1].

A general process such as. The traditional "feature +svm" method is used:

Features used hog and bow.
SVM is using SVM with a histogram intersection kernel
Training time: Positive samples: groundtruth, negative samples, seletive search out of the region overlap in 20%-50%.
Iterative training: At the end of a training session, false positive is placed in a negative sample and trained again

The general process is like this, the following describes the specific regional merger, and similarity calculation, etc.

Regional consolidation

Region- based merging, the region contains more information than the pixel-rich, can effectively represent the object's characteristics of the original region of the acquisition method, the region is merged in a hierarchical manner (hierarchical), similar to the construction process of Huffman tree.

Input: color picture (three channels) Output: possible result of object position L
1. Use efficient graph-based Image segmentation method to get the original split area R={R1,R2,..., RN}2. Initialize similarity set s=?3. Calculate the similarity between the 22 adjacent regions (see part III) and add them to the similarity set S4. From the similarity set S to find, the similarity of the largest two region ri and RJ, merging it into a region RT, from the similarity set to remove the original and RI and RJ adjacent regions calculated similarity, calculate the RT and its adjacent region (formerly the area adjacent to RI or RJ) The similarity, adding its result to the similarity set S. The new zone RT is also added to the zone set R. 5. Obtain the bounding Boxes for each region, and this result is the possible result of the position of the object L

Diversification StrategyThe author gives two diversification strategies: color space diversification, similar diversification. color Space Diversification The author uses 8 different color methods, mainly to consider the scene and lighting conditions. This strategy is mainly applied to the generation of primitive regions in image segmentation algorithm. The main uses of the color space are: (1) RGB, (2) Grayscale I, (3) Lab, (4) RgI (normalized RG Channel plus grayscale), (5) HSV, (6) RGB (normalized RGB), (7) C (), (8) H (HSV H-Channel)

Diversification of similarity calculationWhen the region merges, there is a similarity between the calculated regions, and the paper introduces the calculation methods of four kinds of similarity degree. 1. Color similarity use L1-norm to get the bins histogram of each color channel in the image, so that each region can get a 75-dimensional vector, and the color similarity between regions is calculated by the following formula:
In the region consolidation process, the use of the new area to calculate its histogram is calculated by:
2. Texture (texture) similarity the texture here uses the Sift-like feature. The method is to calculate the Gaussian differential of the variance σ=1 (Gaussian derivative) for each color channel in 8 different directions, each color of each channel gets the histogram of the bins (l1-norm normalization), so that a 240-dimensional vector can be obtained. The method of calculating texture similarity between regions is similar to that of color similarity, and the texture features of new regions are calculated the same as the color features after merging:
3. Size similarity the size here refers to the number of pixels that are contained in the area. The use of the size of the similarity calculation, mainly in order to minimize the small area first merge:
4. Matching (FIT) similarity here is mainly to measure whether the two regions are more "consistent", the indicator is the boundingbox of the combined region (the smallest rectangle that can frame the area (without rotation)) the smaller, the higher the degree of anastomosis. How it is calculated:

Finally, the similarity calculation method is combined, which can be written as follows:
By merging the previous areas, you can get the position of some column objects assuming L. The next task is to find out the real position of the object and determine the category of the object. The commonly used object recognition features are hog (histograms oforiented gradients) and bag-of-words two characteristics. In the exhaustive search method, it takes a lot of time to find the right position, and the ability to select objects for object recognition cannot be too complex, only using a few time-consuming features. Since the selection search (selective search) assumes that this step is more efficient in obtaining the position of the object, it can use a large number of computations, such as sift, to represent the characteristics of a strong ability. In the classification process, the system is using SVM.
feature GenerationIn the implementation process, the system uses Color-sift features and spatial pyramid divsion methods. A sample extraction feature under a scale under σ=1.2. Using SIFT, Extended Opponentsift, Rgb-sift features, the four-layer pyramid model 1x1, 2x2, 3x3, 4x4, extract features, you can get a dimensional eigenvector. (Note: The SIFT feature and the pyramid model are not well understood, not very clear)Training ProcessThe training method uses SVM. First select the object window containing the real result (ground truth) asPositive Sample(positive examples), select the window that overlaps 20%~50% with the positive sample window asNegative Samples(negative examples). Rejecting negative samples that overlap 70% with each other during the selection of the sample can provide a better initialization result. By adding hard negative examples (a high-score negative sample) during the iterative iteration, the model only needs to iterate two times because the training model has a good initialization result. (Sample screening is important!!) ）
Performance EvaluationNaturally, the more the bounding boxes and the real situation (ground truth) of the object are computed by the algorithm, the better the algorithm performance. This is the average highest overlap rate used by the ABO (Average best overlap). For each fixed class C, each real condition (ground truth) is expressed as, so that the computed position assumes that each value L in L, then the ABO formula is expressed as:

The result given above is a category of ABOs, for performance evaluations under all categories, it is natural to use the average MABO of all categories of ABOs(Mean Average Best Overlap) to evaluate.

Selective search for object recognize

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More