NLP: Generate and discriminant in statistical machine learning

Source: Internet
Author: User

Machine learning methods can be divided into generative and discriminative methods.

Generative model: assume that the input is X and the category label is Y. The generative model estimates the joint probability P (X, Y) Because samples can be generated based on the joint probability.

Discriminative model: assume that the input is X and the category label is Y. The discriminant model estimates the conditional probability P (X | Y). This model can only be used to determine the classification because it does not contain knowledge about X.

Here are two concepts:

Suppose I have given a group of samples ). Assume that this is all that can be observed.

Generate model estimation P (Y, X) such as: P (1/2) = 1/4; P (1/4) = 0; P () =;

Discriminant model estimation P (X | Y) such as: P (0 | 1) = 1, P (1 | 1) = 0, P (0 | 2) = 1/2, P (1 | 2) = 1/2

Assume that X represents a category tag (there are two classes: 0 and 1), and Y is a sample attribute (there are two attribute values: 1 and 2 ); then we can classify the given sample attributes through the discriminant model. For example, if the sample attribute is 1, it must be Class 0. However, the discriminant model cannot generate new samples, because we cannot know P (Y) information through the provided discriminant estimation. However, the generation mode can be achieved, because we can provide P (Y) information through the given generation estimation. For example, P (Y = 1) = 1/2, P (Y = 2) = 1/2. At the same time, generative models can be used to obtain discriminative models based on Bayesian formulas, but in turn they won't.

Of course, these two models have their own advantages and disadvantages:

The main feature of the discriminant is to find the optimal classification surface between different categories, reflecting the differences between heterogeneous data. You can use it in Linear discriminant analysis, SVM, and clustering.

Advantages of discriminant:

1) The classification boundary is more flexible and advanced than the pure probability method or production model.
2) can clearly distinguish between multiple classes or between one class and other classes.
3) better performance in clustering, viewpoint changes, partial occlusion and scale variations
4) suitable for recognition of many categories
5) The performance of the discriminative model is simpler and easier to learn than that of the generated model.

Disadvantages:

1) It cannot reflect the characteristics of the training data. Limited capabilities. It can tell you whether it is Class 0 or Class 1, but there is no way to describe the entire scenario.

2) Black Box Operation: The relationships between variables are unclear and invisible. It is actually the embodiment of the first drawback.

Generative features: Generally, posterior probability modeling is used to represent the data distribution from a statistical perspective and reflect the similarity of similar data. Used in Naive Bayes and hidden Markov models

Generative advantages:

1) The actual information is richer than the discriminative model.
2) It is more flexible to study single-class problems than the discriminative model.
3) models can be obtained through incremental learning.
4) can be used in case of incomplete data (missing data)

Disadvantages:

1) complicated learning and computing processes

We can see that the most important difference between the discriminative model and the generative model is that the objective during training is different. The Discriminative model mainly optimizes the conditional probability distribution to make the x and y correspond more, in classification, it is more severable. The model is mainly used to optimize the joint distribution probability of training data.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.