Naive Bayesian VS Logistic regression difference

Source: Internet
Author: User

Summing up, there are several differences:


(1) Naive Bayes is a generation model in which P (x|y) and P (Y) probabilities are calculated from the training data before P (y|x) is calculated, and the P (y|x) is calculated using the Bayesian formula.


The Logistic regression is a discriminant model that is learned by maximizing the discriminant function P (y|x) on the training data set and does not need to know P (x|y) and P (y).


(2) Naive Bayes is based on the conditional independent hypothesis, and the characteristic x contains n characteristic attributes (x1,x2, ... Xn), then in the case of a given y, x1,x2, ... Xn are conditional and independent.


The limitation of the logistic regression is much looser, if the data is full of conditions independent hypothesis, the logistic regression can achieve very good results, when the data dissatisfaction condition independent hypothesis, the logistic Regression still has the ability to adjust the parameters to maximize the data distribution of the model, thus training to obtain an optimal model under the existing data set.


(3) When the data set is relatively small, should choose naive Bayes, in order to achieve good results, the demand for data is O (log n)


When the data set is relatively large, should choose the logistic Regression, in order to achieve good results, the data demand for O (n)


Naive Bayes employs a more rigorous conditional independent hypothesis, in order to calculate P (y|x), we can use statistical methods to statistically measure the number of occurrences of P (x|y) and P (Y) in the dataset, thus obtaining P (x|y) and P (y). Thus, the amount of data needed is smaller, O (log n).


The Logistic regression is a linear search in the entire parametric space when it is calculated, and the required datasets are larger, O (n)


Same point


Both Logistic regression and naive Bayes are linear representations of features.

Both the Logistic regression and the Naive Bayes model are conditional probabilities, which are very well interpreted for the results of the different classes that are finally obtained. Unlike SVM, neural networks are not so explanatory.


Different points

The Logistic regression is better in the performance of the test data in the model of correlation feature above. In other words, when the logistic regression is trained, it can find the optimal parameters regardless of the correlation between the features. In naive Bayes, because of the strict setting of the direct independent of the characteristics of our given features, the weights learned in the relevant feature become larger or smaller at the same time, and the weights between them do not affect each other. In this respect, the Logistic regression relative to naive Bayes is not limited to feature engineering (feature engineering) if it can be well controlled in terms of parameters and handled in the case of loss.


The advantage of Naive Bayes is that I don't have to optimize the parameters this step, through training data I directly get a counting table, which helps to parallelize.


Andrew ng and Michael Jordan issued a nips essay in 2001,  on discriminative vs. generative classifiers:a comparison of logistic Regression and naive bayes , they use these two models to test on various datasets, and finally get better results naive Bayes on small data, with the increase of data and the increase of feature dimension, the Logistic The regression effect is better. This is also because Naive Bayes is a model, in the case of a prior model can be the data fit better, and the logistic regression belong to the generation model, target-driven, not to model the joint probability, through training data directly predict the output, So you can get better results with enough data.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.