Analysis on the model of machine learning deformed parts

Last Update:2018-07-26 Source: Internet

Author: User

Tags svm

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

target detection based on deformed parts model (deformable part Models)

Based on Cascade's target detection, Cascade's cascade thought can quickly discard the smooth window without the target (sliding window), thus greatly improving the detection efficiency, but it is not without shortcomings, the disadvantage is that it only uses a very weak feature, using it to classify the detector is also weak classifier, Just better than a random guess, its precision depends on a number of weak classifiers to implement a vote of veto (that is, everyone is right) to improve the hit rate, to determine the number of classifiers is also an empirical problem. This section is to say that the improved features, as far as possible to improve the characteristics of the detection of any object, of course, deep learning learning characteristics are very effective, but today or in accordance with the paper published in the Order of other methods, (the server has not been configured well, and now can not run deep learning ^.^), In the fourth section, I said ASM and simply raised the next AAM, these two models are actually deformation models (deform model), speaking of the deformation model based on the detection of the object of Daniel, not to mention the University of Chicago professor Pedro F. Felzenszwalb, Pedro publishes a number of papers on target detection based on deformed parts, and relies on this for a lifetime achievement award from the VOC Organizing committee, and its early release of belief propagation for early vision is also well known, Although not as science as the opening up of new areas of paper, but without sacrificing the accuracy of the situation greatly improved the efficiency of the BP algorithm, the BP algorithm is not a neural network BP algorithm, but the probability map model of the reasoning solution (maximum posterior probability), It is also used in the back to say the target detection based on Hough inference. It seems that Pedro is very good at this kind of thing, his other paper, "Cascade Object Detection with deformable part Models" is not sacrificing the accuracy of the deformation-based components to do target detection efficiency increased 20 times times, Learn about this target detection based on deformed parts today.

Target detection based on deformed parts is now a relatively good target detection method in addition to deep learning, first of all to see why the use of deformed parts, in (Figure A), the same person's different posture, how to use the previous sections of what method can detect these different posture of the person. The threshold value is not good, generalized Hough transform line. A person's posture is to transform infinite, need too many template. Hove forest to vote. It seems to be possible, but Hoffman's features are image blocks that are only suitable for objects with little deformation, and are also less applicable when the deformation in the image block is large. So ASM can do that. Think about it. As with the generalized Hough transform, too many mean-value templates are required. The bottom line is that we don't have a good shape description method and no good features. And Pedro almost every paper to improve the shape of the description of the method, and ultimately from a simple representation method to the expression of grammatical form, the evolution of the process can be seen in the reference [4], reference [4] is Pedro's doctoral dissertation.

(Figure I)

Since several methods in the previous section do not solve the problem of detecting large deformation targets, the target detection based on deformed parts should be played. Pedro's five top paper on target detection, the niche does not say one by one, pick three references in the literature to learn. References [1], [2], [3] Describe how to use a deformation model for describing an object (feature phase), how to use a deformed part for detection (feature processing + classification phase), and how to speed up detection. First, the deformed parts of the document [1]. In the deformable part model, the object is represented by describing the position relationship between each section and part (Part+deformable configuration). In fact, as early as 1973, part model was already in the "therepresentation and matching of pictorial structures" This article was presented.

(Figure II)

Part model, we represent objects by describing a collection of parts and connection between parts. (Fig. ii) represents a classical spring model in which each part of the object is connected by a spring. We define a energy function, which measures the sum of the two parts: the degree to which each part is matched and the degree of change in the connection between parts (which can be imagined as the deformation of a spring). The best image to match the model is the one that makes this energy function the smallest. In the formal representation, we can use an g= graph (v,e) to represent the model of the object, V={v1,..., vn} represents n parts, and the Edge (VI,VJ) ∈e represents the connection between the two parts. The configuration of an instance of an object can be expressed as l= (L1,..., Ln), where Li is represented as the position of VI (the configuration of the picture can be simply understood as the placement of the parts, The actual configuration can contain other properties of part. Given an image, using mi (LI) to measure the position of the Li in the image, the degree of mismatch with the template, and the degree of variation in the model when the VI,VJ is placed in the LI,LJ position of the picture, respectively, with Dij (LI,LJ). Therefore, the optimal configuration of a pair of images relative to the model is the one that can make each part match well, and the relative relation between the parts and the model as much as possible. Similarly, the model also naturally describes the two parts. The optimal configuration can be described by the following (Formula One):

(Formula I)

Optimization (Formula I) is actually the classical problem of Markov with the airport, can be solved by the above-mentioned BP algorithm. The theory is to maximize the posterior probability (MAP), because it is easy to convert from the airport to the probability measure (Gibbs measure), in this is not so complicated, want the system of learning related theory can learn probability map model (probabilistic graphical Model). The use of recognition is to use part matching, and to make the least amount of energy, which is somewhat similar to ASM, but ASM does not use the relationship between the parts, but simply to make each match point between the cost and minimum. The results of the match are as follows (Figure III):

(Figure III)

The above method does not use machine learning, the other part of the search is not an easy thing, because the first to approximate the location of the component, so this method also has shortcomings, but the idea of the deformed part can be used as a feature, and then look at Pedro's second article [2] how to use it for target detection.

Pedro in the literature [2] the target detection based on the deformation model uses three aspects of knowledge: 1. Hog Features 2.Part Model 3. Latent SVM.

1. The author uses the Hog feature template to characterize each part and then make a match. And the use of pyramids, that is, at different resolutions to extract hog features.

2. Use the part Model presented in the previous paragraph. When you do an object detection, the Detect window's score equals the part's match score minus the cost of the model change.

3. When training a model, you need to train the hog template for each part, as well as the parameters that measure the cost of the part location distribution. In this paper, the latent SVM method is proposed to transform the learning problem of deformable part model into a classification problem. Using SVM Learning, the position distribution of part is used as latent values, and the parameters of the model are transformed into the segmented hyper plane of SVM. In the concrete implementation, the author uses the iterative calculation method to update the model continuously.

For the above three, we may have a few questions: 1, where the parts come from. 2, how to use parts to do the test. Before the component-based target detection, the Dalal-triggs method to win the 2006 challenge of the Pascal VOC is to use hog as a feature, then directly based on the sliding window of different scales, like a filter that wins short honors by this filter, but cannot resist large deformation targets. Pedro improved the Dalal-triggs method, which he calculates as a score, where Beta is the filter, phi (x) is the eigenvector. A root component is found through the filter p0, the root part has a special filter, and a series of non-root components (parts) p1...pn, and then they form a star structure, this time review (figure I) of the deformation model thought. Each part is used to represent, where x, y is the coordinate and L represents the pyramid level. When the matching score of the star structure minus the cost of the model change is finally highest, the match is completed, as shown in equation two:

(Formula Two)

Where F ' represents the vectorization representation of the filter, B is the offset term, and H represents the feature pyramid. Now suppose the filter solves the part, completes the match, and answers the second question, but where does the filter come from, and simply what is the weight of this filter beta? Now do not know the parts, do not know the filter, there is no component without the filter, no parts can not find the parameters of the filter, this is the typical EM algorithm to solve the matter, but the author does not use the EM algorithm, but the use of hidden SVM (latent svm) method, The implicit variable is actually the factor analysis in the similar statistic, here is to find the latent part. During training, some parts are labeled, used for beta, and then used for beta to find potential parts, so using coordinatedescent iterative solution, once again encountered this solution method. With parts and scoring, it is the search for the optimal combination of root and other components that can be used in dynamic programming, but very slowly, please refer to the literature [2].

Although the pyramid is used to speed up the search in the literature [2], the search matching calculation for the star structure combination is also very large and the detection speed is slightly slower. So then the third article [3], the literature [3] is the accelerated detection process, for the star structure model using Cascade to judge, to quickly throw away the part without effective information, in fact, the location of the root parts of the match played a great role, and then in turn to other components (n+1), with this relationship, After taking a subset of the parts we can use cascade to trim and throw away some combinations of parts that are not well-configured (the official term is called configuration), so that some combinations that score high in the weak classifier go a step further, similar to the cascade idea of Cascade, but note that each part of the deformation model should be relevant , and should not be like the above Harr-like features between the Independent, in order to judge here does not work, here is actually a sub-sequence matching problem, the literature [7] proposed a solution, Pedro and improved this method, on the basis of the original N+1 components to increase the n+1 can quickly calculate the simple components, After this disruption, the sub-sequence matching is less expensive.

The following formally into the inspection process to see how to accelerate, the approximate process as shown in (Figure IV):

(Figure IV)

The meanings of each of these notation are as shown in (Fig. V) (note that P is not the part mentioned above, but the contribution of Part VI):

(Figure V)

The concept of component-based testing is almost there, but there are more trick that don't say, such as threshold selection, how to calculate simple parts, etc.

The test results are as follows (Figure VI):

(Figure VI)

This is a study of the "journal", it is inevitable that the wrong place, such as the discovery please point out, thank you. All the documentation codes in this section are integrated on the Pedro home page.

Reference documents:

[1] Pictorial structures for Object recognition. Pedro F.felzenszwalb

[2] Object Detection with discriminatively trained part Based Models.pedro F. felzenszwalb

[3] Cascade Object Detection with deformable part Models. Pedro F.felzenszwalb

[4] From Rigidtemplates to Grammars:object Detection with structured Models. Pedro F.felzenszwalb

[5] Histogramsof oriented gradients for human detection. N. Dalal and B. Triggs

[6] http://bubblexc.com/y2011/422/

[7] A computational model for visual selection. Y. Amit and D.geman

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More