I encountered some simple basic concepts, but I thought about it and found that I didn't fully understand it. The following is a summary of the Search:
【Abstract]
-Generation model: Infinite sample = probability density model = generation model = Prediction
-Discriminative model: Finite Sample = Discriminant Function = prediction model = Prediction
[Overview]
Simply put, assume that o is the observed value, and q is the model.
If you model P (o | q), it is the Generative model. The basic idea is to first establish the probability density model of the sample and then use the model for inference and prediction. It is required that the known samples are infinite or as large as possible.
This method is generally based on statistical mechanics and bayes Theory.
If you model conditional probability (posterior probability) P (q | o), it is the Discrminative model. The basic idea is to establish a discriminant function under a finite sample without considering the sample generation model and directly researching the prediction model. The representative theory is the statistical learning theory.
Currently, these two methods have a lot of crossover.
[Discriminant Model Discriminative Model] -- inter-class probabilistic description
It can also be called a conditional model or a conditional probability model. Conditional Probability distribution and p (class | context) are estimated ).
The positive and negative examples and classification labels are used to determine the edge distribution of the model. The target function directly corresponds to the classification accuracy.
-Main features:
Find the optimal classification surface between different categories, reflecting the differences between different types of data.
-Advantages:
The classification boundary is more flexible than the pure probability method or production model.
Can clearly distinguish between multiple classes or between one class and other classes.
Good results in clustering, viewpoint changes, partial occlusion and scale variations
Suitable for identification of many categories
The performance of the discriminative model is simpler and easier to learn than that of the generated model.
-Disadvantages:
Does not reflect the characteristics of the training data. Limited capabilities. I can tell you whether it is 1 or 2, but there is no way to describe the entire scenario.
Lack elegance of generative: Priors, structure, uncertainty
Alternative notions of penalty functions, regularization, Core Function
Black box operation: the relationships between variables are unclear and invisible.
-Common causes include:
Logistic regression
SVMs
Traditional neural networks
Nearest neighbor
Conditional random fields (CRF): The latest popular model developed from the NLP field is evolving to ASR and CV.
-Main applications:
Image and document classification
Biosequence analysis
Time series prediction
Generative Model -- intra-class probabilistic description
It is also called a generative model. It is estimated that the joint probability distribution (joint probability distribution), p (class, context) = p (class | context) * p (context ).
It is used for modeling of randomly generated observed values, especially when some hidden parameters are specified. In machine learning, it is used for directly modeling data (using probability density functions to model the observed draw) or as an intermediate step for generating conditional probability density functions. Bayesian rule can be used to obtain conditional distribution from the generated model.
If the observed data is completely generated by the generated model, fitting can generate the model parameters, thus only increasing the data similarity. However, data is rarely fully obtained from the generated model. Therefore, the more accurate method is to directly model the conditional density function, that is, to use classification or regression analysis.
Different from the description model, all variables in the description model are directly measured.
-Main features:
Generally, posterior probability modeling is used to represent the distribution of data from a statistical perspective and reflect the similarity of similar data.
Only focus on your inclass itself (that is, the probability in the lower-left corner), and do not care where the demo-boundary is.
-Advantages:
In fact, the information is richer than the discriminative model,
The Research on single-class problems is more flexible than the discriminative model.
Models can be obtained through incremental learning.
Can be used in case of incomplete data (missing data)
Modular construction of composed solutions to complex problems
Prior knowledge can be easily taken into account
Robust to partial occlusion and viewpoint changes
Can tolerate significant intra-class variation of object appearance
-Disadvantages:
Tend to produce a significant number of false positives. This is particle ly true for object classes which share a high visual similarity such as horses and cows
Complex learning and computing processes
-Common causes include:
Gaussians, Naive Bayes, Mixtures of multinomials
Mixtures of Gaussians, Mixtures of experts, HMMs
Sigmoidal belief networks, Bayesian networks
Markov random fields
The enumerated Generative model can also be trained using the disriminative method, such as GMM or HMM. The training method is EBW (Extended Baum Welch), or the Large Margin method proposed by Fei Sha recently.
-Main applications:
NLP:
Traditional rule-based or Boolean logic systems (Dialog and Lexis-Nexis) are giving way to statistical approaches (Markov models and stochastic context grammars)
Medical Diagnosis:
QMR knowledge base, initially a heuristic expert systems for reasoning about diseases and symptoms been augmented with demo-theoretic formulation Genomics and Bioinformatics
Sequences represented as generative HMMs
[Relationship between the two]
The Discriminative model can be obtained from the generated model, but cannot be generated from the discriminative model.
Can performance of SVMs be combined elegantly with flexible Bayesian statistics?
Maximum Entropy Discrimination marries both methods: Solve over a distribution of parameters (a distribution over solutions)
[Reference website]
Http://prfans.com/forum/viewthread.php? Tid = 80
Http://hi.baidu.com/cat_ng/blog/item/5e59c3cea730270593457e1d.html
Http://en.wikipedia.org/wiki/Generative_model
Http://blog.csdn.net/yangleecool/archive/2009/04/05/4051029.aspx
============================
Comparison of three models: HMMs and MRF and CRF
Http://blog.sina.com.cn/s/blog_4cdaefce010082rm.html
HMMs (Hidden Markov Model ):
Status sequence cannot be directly observed (hidden );
Each observation is considered a random function of the state sequence;
The state transition matrix is a random function that changes the State based on the transition probability matrix.
The difference between HMMs and MRF is that it only contains the label field variable, not the observation field variable.
MRF (Markov Random Field)
Simulate an image into a grid composed of random variables.
Each of these variables has a clear dependence (Markov) on the nearest neighbor composed of random variables other than itself ).
CRF (Conditional Random Field), also known as Markov Random Field
A conditional probability model used to mark and split ordered data.
In terms of form, CRF can be regarded as an undirected graph model to evaluate the conditional probability of the labeled sequence of a given input sequence.
Application in visual problems:
HMMs: Image Denoising, image texture segmentation, fuzzy image restoration, texture image retrieval, automatic target recognition, etc.
MRF: Image Restoration, image segmentation, edge detection, texture analysis, target matching, and Recognition
CRF: Target Detection, recognition, and segmentation in sequential images
P.S.
The label field is a hidden random field, which describes the local correlation property of pixels. The model used should be highly flexible based on people's understanding of the image structure and features.
The prior models of the airspace labeling field mainly include non-causal Markov models and causal Markov models.
Reference: link
========================================================== ============
Probability graph model generation model and discriminant model
In natural language processing, sequence tagging problems (word segmentation, part-of-speech tagging, and fast group analysis) are often handled to mark a sequence for a given sequence of observation.
LingOAndSRepresents the observed sequence and the labeled sequence respectively,
According to Bayesian formula,
1Definition of model generation and discriminant model
PairOAndSThere are two methods to perform statistical modeling:
(1) generate a model
BuildOAndSJoint Distribution p (S,O)
(2) discriminative Model
Construct conditional distribution of o and s p (S|O)
2Comparison between discriminative model and generative model
(1) The optimization principles are different during training.
Generate a model to optimize the probability of joint distribution of training data;
The Discriminative model optimizes the conditional distribution probability of the training data, and the discriminative model has a good correspondence with the sequence labeling problem.
(2) processing of the observed sequence is different.
During model generation, the observed sequence is part of the model;
In the discriminant model, the observed sequence is only used as a condition, so you can design flexible features for the observed sequence.
(3) Different Training complexity
The training complexity of the discriminative model is high.
(4) Whether training without guidance is supported
No guidance training is supported for model generation.
From: link
========================================================== ================
An easy-to-understand explanation is as follows:
Let's say you have input data x and you want to classify the data into labels y.Generative modelLearnsJoint probability distribution p (x, y)AndDiscriminative modelLearnsConditional probability distribution p (y | x)-Which you shoshould read as 'The probability of y given x '.
Here's a really simple example. Suppose you have the following data in the form (x, y ):
(1,0), (1,0), (2,0), (2, 1)
P (x, y) is
y=0 y=1 ----------- x=1 | 1/2 0 x=2 | 1/4 1/4
P (y | x) is
y=0 y=1 ----------- x=1 | 1 0 x=2 | 1/2 1/2
If you take a few minutes to stare at those two matrices, you will understand the difference between the two probability distributions.
The distribution p (y | x) isNaturalDistribution for classifying a given example x into a class y, which is why algorithms that model this directly are calledDiscriminativeAlgorithms. generative algorithms model p (x, y), which can be tranformed into p (y | x) by applying Bayes rule and then used for classification. however, the distribution p (x, y) can also be used for other purposes. for example you cocould use p (x, y)GenerateLikely (x, y) pairs.
From the description above you might be thinking that generative models are more generally useful and therefore better, but it's not as simple as that. this paper is a very popular reference on the subject of discriminative. generative classifiers, but it's pretty heavy going.The overall gist is that discriminative models generally outperform generative models in classification tasks.
Another explanation is as follows:
- The Discriminative Model can be called a conditional Model or a conditional probability Model. Conditional Probability distribution and p (class | context) are estimated ).
- Generative Model is also called a Generative Model. It is estimated that the joint probability distribution (joint probability distribution), p (class, context) = p (class | context) * p (context ).
From: http://www.zhizhihu.com/html/y2010/1468.html