Introduced
The task of supervised learning is to learn a model (or get a target function) and apply this model to predict the corresponding output for a given input. The general form of this model is a decision function y=f (X), or conditional probability distribution P (y| X).
The supervised learning method can be divided into generative method (generative approach) and discriminant method (discriminative approach). The models were generated (generative model) and discriminant models (discriminative model).
Decision function and conditional probability distribution decision function y=f (X)
Decision function Y=f (x): You enter an x, it outputs a Y, and this y is compared to a threshold, which determines which category X belongs to according to the comparison results. For example, two classes (W1 and W2) Classification problems, if Y is greater than the threshold value, X belongs to class W1, and if the threshold is less than the class W2. This gives the category of the x corresponding.
Conditional probability distribution P (y| X
You enter an x by comparing the probability that it belongs to all classes, and then outputting the one with the highest probability as the corresponding category of the X. For example, if P (w1| X) greater than P (w2| x), then we think that x belongs to the W1 class.
Summary
Each of the two models enables the ability to predict the corresponding output y for a given input x. Actually through conditional probability distribution P (y| x) is also implicitly expressed in the form of a decision function y=f (x).
And again, the amazing thing is that in fact the decision function y=f (X) also implies the use of P (y| X). Because the general decision function y=f (X) is to minimize the squared error between your predictions and training data through a learning algorithm, Bayesian tells us that although it does not explicitly use Bayes or some form to calculate probabilities, it is actually an implicit output maximum likelihood hypothesis (map hypothesis). In other words , the task of the learner is to output the maximum likelihood hypothesis under the condition of equal prior probability for all hypothetical models.
Build methods and Build models
Generate model: infinite sample = = "probability density model = generation model = =" Prediction
The generation method is obtained by the data Learning Joint probability distribution P (x, y) and then the conditional probability distribution P (y| x) =p (x, y)/P (×) as the model for prediction. Such a method becomes a build method because the model represents a generation relationship that produces output y for a given input x . The observation values are modeled for random generation, especially given the case of certain hidden parameters. Typical generation models are: Naive Bayesian method, Markov model, Gaussian mixture model. This method is generally based on statistics and Bayes theory.
Characteristics of the Generation method
- From the statistical point of view, the distribution of data can reflect the similarity of similar data ;
- The generation method restores the joint probability distribution, and the discriminant method cannot.
- The learning convergence speed of the Generative method is faster, that is, when the sample capacity increases, the learning model can converge to the real model faster;
- When there are hidden variables, throw can be used to learn the method of generation, when the discriminant method can not be used
Discriminant method and discriminant model
Discriminant Model: Finite sample = = "discriminant function = Predictive model = =" Prediction
The discriminant method is directly studied by the data decision function f (X) or conditional probability distribution P (y| X) as a predictive model, i.e. a discriminant model. The Discriminant method is concerned with what output y should be predicted for the given input x. Typical discriminant models include K-nearest neighbor method, Perceptron, decision tree, logistic regression model, maximum entropy model, support vector machine, boosting method and conditional random field. The discriminant model uses positive and negative examples and classification labels to focus on the edge distribution of the discriminant model.
Characteristics of discriminant method
- The discriminant method looks for the optimal classification surface between different categories, and reflects the difference between heterogeneous data .
- The Discriminant method utilizes the class identification information of the training data, and directly learns the conditional probability P (y| x) or the decision function f (x), directly facing the prediction, often the learning accuracy rate is higher;
- Because the direct learning condition probability p (y| x) or the decision function f (x), you can simplify the learning problem by abstracting, defining features, and using features on the data in various degrees.
- The disadvantage is that it doesn't reflect the nature of the training data itself
Comparison of discriminant models and generation models
(1) When training, the two optimization criteria are different
The joint distribution probability of the model optimization training data is generated.
Discriminant model optimizes the conditional distribution probability of the training data, and the discriminant model has a good correspondence with the sequence marking problem.
(2) Different processing of observation sequences
In the generation model, the observation sequence is used as part of the model;
In the discriminant model, the observation sequence is only used as a condition, so flexible features can be designed for the observation sequence.
(3) Different training complexity
The training complexity of discriminant model is higher.
(4) Whether to support training without guidance
The build model supports training without guidance.
(5) Essential Difference
The discriminative model estimates the conditional probability distribution (conditional distribution) p (class|context)
The generative model estimates the joint probability distribution (joint probability distribution) p ()
In addition, a discriminant model can be obtained from the generation model, but the model cannot be generated by the discriminant model.
For tracking algorithms
As a result of the Camshift method before the face tracking, here to see the description of the tracking algorithm, hereby stated.
The tracking algorithm can be divided into two categories: the model based on the appearance model or the discriminant model based on the appearance model.
Generate a model: Typically, you learn a model that represents the target, and then use it to search the image area and then minimize the refactoring error. Similar to the build model describes a goal, then the pattern matching, in the image to find the best match with the model of the region, is the target.
Discriminant Model: The tracking problem is considered as a two classification problem, and then the decision boundary of the target and the background is found. It doesn't matter how the goal is described, just know what the difference is between the target and the background, and then you give an image, and it looks at the side of the border and it's classified as.
Resources
Statistical learning methods Hangyuan Li, Tsinghua University Press
CSDN blog Generation model and discriminant model
Reprint please indicate the author Jason Ding and its provenance
GitHub home page (http://jasonding1354.github.io/)
CSDN Blog (http://blog.csdn.net/jasonding1354)
Jane Book homepage (http://www.jianshu.com/users/2bd9b48f6ea8/latest_articles)
"Machine Learning Basics" generation model and discriminant model