Introduction to how to use the naive Bayesian algorithm in Python

Source: Internet
Author: User
Again, here's why the title is "using" instead of "Implementing":

First, professionals provide algorithms that are higher than our own algorithms, whether efficient or accurate.

Secondly, for those who are not good at maths, it is very painful to study a bunch of formulas in order to realize the algorithm.

Again, there is no need to "reinvent the wheel" unless the algorithms provided by others meet their own needs.

Below the point, do not understand the Bayesian algorithm can go to check the relevant information, here is just a brief introduction:

1. Bayesian formula:

P (a| b) =p (AB)/p (b)

2. Bayesian Inference:

P (a| B) =p (A) XP (b| A)/P (B)

To express in words:

Posteriori probability = priori probability x similarity/normalized constant

The problem that Bayesian algorithm solves is how to find the similarity degree, namely: P (b| A) the value

3. There are three common naive Bayesian algorithms available in the Scikit-learn package, which are described below in turn:

1) Gaussian naive Bayes: Assume that attributes/features are subject to normal distribution (e.g.) and are mainly applied to numerical characteristics.

Using the data that comes with the Scikit-learn package, the code and description are as follows:

>>>from sklearn Import Datasets # #导入包中的数据 >>> iris=datasets.load_iris () # #加载数据 >>> Iris.fea Ture_names # #显示特征名字 [' sepal Length (cm) ', ' sepal width (cm) ', ' petal length (cm) ', ' petal width (cm) ']&GT;&G t;> Iris.data # #显示数据 Array ([[5.1, 3.5, 1.4, 0.2],[4.9, 3., 1.4, 0.2],[4.7, 3.2, 1.3, 0.2].     ..........>>> iris.data.size # #数据大小---600 >>> iris.target_names # #显示分类的名字 Array ([' Setosa ', ' versicolor ', ' virginica '], dtype= ' <u10 ') >>> from sklearn.naive_bayes import GAUSSIANNB # #导入高斯朴素贝叶斯算法 >>> CLF = GAUSSIANNB () # #给算法赋一个变量, mainly for ease of use >>> clf.fit (Iris.data, I Ris.target) # #开始分类. For a particularly large sample, you can use the function Partial_fit classification to avoid loading too much data into memory >>> clf.predict (Iris.data[0].reshape (1,-1)) # #验证分类. The red Section specifically explains: Because the predict parameter is an array, data[0] is a list, so you need to convert the array ([0]) >>> data=np.array ([6,4,6,2]) # #验证分类 ; >> Clf.predict (Data.reshape (1,-1)) array ([2])

Here's a question: How to tell if the data fits the normal distribution? The R language has related function judgments, or direct drawing can be seen, but all are P (x, y) This can be directly in the coordinate system

Drawing out of the situation, and the example of how to determine the data, is not yet understood, this part of the follow-up will be.

2) Polynomial distribution Naive Bayes: Commonly used in text classification, characterized by words, the value is the number of occurrences of a word.

# #示例来在官方文档, see the first example >>> import numpy as np>>> X = Np.random.randint (5, size= (6, +))    # # Return random integer value: Range [0,5) size 6*100 6 rows 100 columns >>> y = Np.array ([1, 2, 3, 4, 5, 6]) >>> from Sklearn.naive_bayes import Mu Ltinomialnb>>> CLF = MULTINOMIALNB () >>> clf.fit (X, y) multinomialnb (alpha=1.0, Class_prior=none, fit _prior=true)  >>> print (clf.predict (x[2)) [3]

3) Bo effort naive Bayesian: Each characteristic is a Boolean type, the result is 0 or 1, that appears not to appear

# #示例来在官方文档, detailed description See first example >>> import numpy as np>>> X = Np.random.randint (2, size= (6)) >>> Y = Np.array ([1, 2, 3, 4, 4, 5]) >>> from Sklearn.naive_bayes Impo RT bernoullinb>>> CLF = BERNOULLINB () >>> clf.fit (X, Y) bernoullinb (alpha=1.0, binarize=0.0, Class_ Prior=none, Fit_prior=true) >>> print (Clf.predict (x[2)) [3] 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.