Regionlets for Generic Object Detection
This article is a translation of this article and self-understanding, article: http://download.csdn.net/detail/autocyz/8569687
Summary:
For the general object detection, the problem now facing is how to solve the problem of recognition by the change of angle of object with comparatively simple calculation method. To solve this problem, it is necessary to require a flexible method of object description, and this method can be well judged for objects in different positions.
Based on this situation, the author uses cascaded boosting classifier to establish an object classification model. This model contains different feature types. These feature types are obtained by calculating the local area characteristics. These characteristics are called Regionlet.
Regionlet is the basic feature area, which is defined proportionally to the detection window of any resolution. The regionlet, which have relative positional relationships, are placed in a group to depict the texture distribution in an object.
To adapt to the deformation of an object, the author merges those regionlets features into one-dimensional features. The author then calculates the bounding rectangle of the object, which gets the starting point of the split and limits the number of starting points to thousands of.
Introduction:
Although the detection of rigid objects (which do not change or change in shape) has achieved considerable success, there are still many problems to be solved in the detection of general objects. At present, the main problem is still the problem of recognition caused by the deformation of objects. And there are two reasons for this problem, one is that the object is its own deformation, such as the cat's, its different movements will cause its appearance change, another reason is because the visual angle and distance changes, such as the vehicle, although its appearance will not change, but stand at different angles and distances, see the appearance is not the same.
The above-mentioned problems also illustrate a more important issue for the expression of object types. On the one hand, a well-described template that describes the characteristics of rigid objects may hardly be suitable for deformed objects. On the other hand, a template with good deformation tolerance may lead to inaccurate or relative error rates when detecting rigid objects.
In this paper, a new general object expression strategy is proposed, which integrates adaptive deformation solution into classifier learning and feature extraction. In this method, a cascade boosting classifier is used to classify the object frame. In boosting, each weak classifier is input with the response of the region feature within the box, and these regions are then expressed in a set of sub-regions, which are called regionlets. Of course, these regionlets are not randomly chosen, but are picked up by boosting from a huge pool of candidates.
On the one hand, the relative position of regionlets in the region and the position of the area in the object frame are relatively stable, so this regionlet expression method can establish a more detailed spatial expression model. On the other hand, the characteristic responses of each group of Regionlets are combined into one-dimensional features, and the one-dimensional features have better robustness to local deformation.
In addition, in order to improve the flexibility of Regionlet model, the author adopts different size and aspect ratio of regionlet, at the same time, take advantage of selective search strategy, so that the number of candidate boxes obtained is thousands of orders of magnitude, far less than the number of sliding window methods.
This article main contribution has two points: 1, proposed the Regionlet method, this method can extract the characteristic flexibly from the arbitrary box. 2, for a class of objects, based on the expression of Regionlet, not only in the object to establish a relative spatial distribution model, but also by combining the boosting selected regionlets and a group of regionlets aggregation of the characteristics of the response of these two methods, So that it can adapt well to the situation of object change, especially deformation.
definition of Regionlet:
For object detection, the classification of objects is essentially defined by the classifier. This classifier contains the physical appearance and spatial distribution of the object.
The physical appearance of an object is usually extracted from the rectangular area containing the object. In the interior of the object, the feature is extracted with a small rectangular frame, which has a good locality, but the treatment of deformation is poor. With a large rectangular frame extraction, although the deformation has a good ability to handle, but can not be very good precision positioning. However, large rectangular frames may also not be able to extract object features when larger objects change, especially when deformation occurs. Because, inside the rectangle, there may be some pieces of information that are useless or even intrusive.
In view of the above, the author would like to be able to find some sub-regions--regionlets, the sub-regions as the basic template for feature extraction, and then put these templates in a group, such a set of features can be more flexible to describe different objects, and for the deformation is also very good tolerance .
Take the above diagram as an example, the first column is the object to be detected-people, the second column in the image of the black box represents the size of the original, the interior of the blue box is the region to extract features--r, where the extraction of human characteristics is mainly extracted from the upper body characteristics. The orange rectangle in the blue box is the sub-region of the extracted feature--regionlet, where the regionlet is the position of the hand, as the person is physically changed, but the degree of human hand deformation is small. Combining the three regionlet in the second column becomes the R1,R2,R3 in the last picture, the regionlets.
Below to carefully analyze this picture: First is the selection of Regionlet, here is the selection of the representative of the hand, the three pictures of the human body are deformed, and the biggest factor of deformation is the change in hand position. But note that the position of the hand has changed, but the deformation of the hands itself appears to be relatively small in the picture. The selection of this group of Regionlet is quite ingenious, each individual regionlet has the very good representative, can compare the outstanding performance person's characteristic, but the combination in a group of regionlets, for in different position hand's condition can accurately extract the hand characteristic, Able to deal with the deformation of the situation, double benefit, of course, this is just a simple, in the actual algorithm, an R region, there will not be only one regionlet, and it is not so through the analysis of human characteristics to determine the location of Regionlet, As for how to determine these regionlets, the latter will speak.
extraction of features in region:
The process of extracting features from Region--r has two main steps:
The first step: extracting the Hog and LBP features of each regionlet.
Step two: Combine the features extracted from the regionlets.
The first step is relatively simple, do not repeat, here in detail about the second step of the implementation process.
The process of combining the features extracted by the Regionlet is actually a feature screening process, in which he selects the most representative region feature in the Regionlets.
For example, the authors first extract these regionlets series low-dimensional features, get the learned dimension that one, and then through a boosting learning machine, choose the most unusual one. It is found that the first item is the most unusual, because in the Regionlet area containing the hand, the first item is significantly higher than the first of the other two regionlet. Finally, the author selects the first item of the three Regionlet which has the strongest characteristic response as the characteristic expression of the whole region R.
scale normalization of the detection window:
The author's Regionlet method is implemented in the object candidate box. The candidate box is referenced in this article K. E. Van de Sande, J. R. R. Uijlings, T. Gevers, and A. W. M. Smeulders. Segmentation as selective search for object recognition. In ICCV, 2011. Do not repeat here.
Using the appeal of the candidate box to obtain the method, the candidate box is obtained, you will use the detection window to detect the candidate area. Before the test, the author has done a scale normalization of the detection window, and the processing method is as follows:
As shown, for figure (a), the graph is a small candidate window, its size is (H,W), and now the candidate window is detected with a check box of size (l,t,r,b). When a larger candidate window is presented, it is assumed that the size (h ', w ') corresponds to (b), and if the size is still used (L,T,R,B), it is obvious that the relative position is changed, which does not conform to the Regionlet's relative position. Therefore, the author first in the diagram (a) To the detection window normalization, to obtain the normalized scale (l/w,t/h,r/w,b/h), when the detection window becomes large (h ', W '), its detection window becomes (LW '/w,th '/h,rw '/w,bh '/h). This method of normalization of windows can be directly detected on different sizes of images.
establish region and Regionlets pool:
The author has established a complete region and Regionlet pool, which contains region and regionlet of different sizes, different positions and different aspect ratios. The resulting method is as follows:
The R ' = (l ', T ', R ', B ', K) in the method, K represents the k element of the low-dimensional eigenvector of region.
Regionlets for Generic Object Detection