Describes in detail how to use the naive Bayes algorithm in python.

Source: Internet
Author: User
This article describes how to use the naive Bayes algorithm in python. It has good reference value. Next, let's take a look at it. This article mainly introduces how to use the naive Bayes algorithm in python. It has good reference value. Let's take a look at it with the small editor.

Here we will repeat why the title is "use" instead of "implement ":

First, the algorithms provided by professionals are more efficient and accurate than the algorithms we write.

Secondly, for people with poor mathematics, it is very painful to study a bunch of formulas to implement algorithms.

Again, unless the algorithms provided by others cannot meet your needs, there is no need to "duplicate the wheel ".

The following is a back-to-back example. if you do not know the Bayesian algorithm, you can check the relevant information. here is a brief introduction:

1. Bayesian formula:

P(A|B)=P(AB)/P(B)

2. Bayesian inference:

P(A|B)=P(A)×P(B|A)/P(B)

In text:

Posterior probability = anterior probability × similarity/standardization constant

The Bayesian algorithm solves the problem of finding similarity, that is, the value of P (B | ).

3. three common naive Bayes algorithms are provided in the scikit-learn package. The following describes them in sequence:

1) Gaussian naive Bayes: assuming that attributes/features are normally distributed (for example,), they are mainly used for numeric features.

Use the data in the scikit-learn package. the code and description are as follows:

>>> From sklearn import datasets # import data in the package >>> iris = datasets. load_iris () # load data> iris. feature_names # Display feature names ['sepal length (cm) ', 'sepal width (cm)', 'petal length (cm) ', 'petal width (cm) '] >>> iris. data # display data array ([[5.1, 3.5, 1.4, 0.2], [4.9, 3 ., 1.4, 0.2], [4.7, 3.2, 1.3, 0.2] ......> iris. data. size ## data size --- 600 >>> iris.tar get_names ## display the category name array (['setopa', 'Versicolor', 'virginica '], dtype ='
 
  
> From sklearn. naive_bayes import GaussianNB # import Gaussian naive Bayes algorithm> clf = GaussianNB () # assign a variable to the algorithm for ease of use> clf. fit (iris. data, iris.tar get) # Start classification. For a large sample, you can use the partial_fit function to classify it to avoid loading too much data to the memory at a time> clf. predict (iris. data [0]. reshape (1,-1) # verify the category. Note: Because the predict parameter is an array and data [0] is a list, you need to convert array ([0]) >>> data = np. array ([6, 4, 6, 2]) # verify category> clf. predict (data. reshape (1,-1) array ([2])
 

Here we have a question: how can we determine that the data conforms to the normal distribution? In the R language, there are related function judgments, or direct plotting can also be seen, but they are all P (x, y), which can be directly in the coordinate system.

I have not figured out how to determine the data in the example. This part will be added later.

2) polynomial distribution naive Bayes: it is often used for text classification. features are words, and values are the number of times words appear.

# Examples are provided in the official documentation. for details, see the first example> import numpy as np> X = np. random. randint (5, size = (6,100) # returns a random integer in the range of [100) 6x100 6 rows, columns> y = np. array ([1, 2, 3, 4, 5, 6]) >>> from sklearn. naive_bayes import MultinomialNB >>> clf = MultinomialNB () >>> clf. fit (X, y) MultinomialNB (alpha = 1.0, class_prior = None, fit_prior = True) >>> print (clf. predict (X [2]) [3]

3) Berner strives for naive Bayes: each feature is Boolean and the result is 0 or 1, that is, it does not appear.

# Examples are provided in the official documentation. for details, see the first example> import numpy as np> X = np. random. randint (2, size = (6,100) >>> Y = np. array ([1, 2, 3, 4, 4, 5]) >>> from sklearn. naive_bayes import BernoulliNB >>> clf = BernoulliNB () >>> clf. fit (X, Y) BernoulliNB (alpha = 1.0, binarize = 0.0, class_prior = None, fit_prior = True) >>> print (clf. predict (X [2]) [3]

Note: This article is not complete yet. some instructions in example 1 need to be written. There are many things recently and will be improved in the future.

The preceding section details how to use the naive Bayes algorithm in python. For more information, see other related articles in the first PHP community!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.