Introduction to how to use the naive Bayesian algorithm in Python

Last Update:2017-03-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Again, here's why the title is "using" instead of "Implementing":

First, professionals provide algorithms that are higher than our own algorithms, whether efficient or accurate.

Secondly, for those who are not good at maths, it is very painful to study a bunch of formulas in order to realize the algorithm.

Again, there is no need to "reinvent the wheel" unless the algorithms provided by others meet their own needs.

Below the point, do not understand the Bayesian algorithm can go to check the relevant information, here is just a brief introduction:

1. Bayesian formula:

P (a| b) =p (AB)/p (b)

2. Bayesian Inference:

P (a| B) =p (A) XP (b| A)/P (B)

To express in words:

Posteriori probability = priori probability x similarity/normalized constant

The problem that Bayesian algorithm solves is how to find the similarity degree, namely: P (b| A) the value

3. There are three common naive Bayesian algorithms available in the Scikit-learn package, which are described below in turn:

1) Gaussian naive Bayes: Assume that attributes/features are subject to normal distribution (e.g.) and are mainly applied to numerical characteristics.

Using the data that comes with the Scikit-learn package, the code and description are as follows:

>>>from sklearn Import Datasets # #导入包中的数据 >>> iris=datasets.load_iris () # #加载数据 >>> Iris.fea Ture_names # #显示特征名字 [' sepal Length (cm) ', ' sepal width (cm) ', ' petal length (cm) ', ' petal width (cm) ']&GT;&G t;> Iris.data # #显示数据 Array ([[5.1, 3.5, 1.4, 0.2],[4.9, 3., 1.4, 0.2],[4.7, 3.2, 1.3, 0.2].     ..........>>> iris.data.size # #数据大小---600 >>> iris.target_names # #显示分类的名字 Array ([' Setosa ', ' versicolor ', ' virginica '], dtype= ' <u10 ') >>> from sklearn.naive_bayes import GAUSSIANNB # #导入高斯朴素贝叶斯算法 >>> CLF = GAUSSIANNB () # #给算法赋一个变量, mainly for ease of use >>> clf.fit (Iris.data, I Ris.target) # #开始分类. For a particularly large sample, you can use the function Partial_fit classification to avoid loading too much data into memory >>> clf.predict (Iris.data[0].reshape (1,-1)) # #验证分类. The red Section specifically explains: Because the predict parameter is an array, data[0] is a list, so you need to convert the array ([0]) >>> data=np.array ([6,4,6,2]) # #验证分类 ; >> Clf.predict (Data.reshape (1,-1)) array ([2])

Here's a question: How to tell if the data fits the normal distribution? The R language has related function judgments, or direct drawing can be seen, but all are P (x, y) This can be directly in the coordinate system

Drawing out of the situation, and the example of how to determine the data, is not yet understood, this part of the follow-up will be.

2) Polynomial distribution Naive Bayes: Commonly used in text classification, characterized by words, the value is the number of occurrences of a word.

# #示例来在官方文档, see the first example >>> import numpy as np>>> X = Np.random.randint (5, size= (6, +))    # # Return random integer value: Range [0,5) size 6*100 6 rows 100 columns >>> y = Np.array ([1, 2, 3, 4, 5, 6]) >>> from Sklearn.naive_bayes import Mu Ltinomialnb>>> CLF = MULTINOMIALNB () >>> clf.fit (X, y) multinomialnb (alpha=1.0, Class_prior=none, fit _prior=true)  >>> print (clf.predict (x[2)) [3]

3) Bo effort naive Bayesian: Each characteristic is a Boolean type, the result is 0 or 1, that appears not to appear

# #示例来在官方文档, detailed description See first example >>> import numpy as np>>> X = Np.random.randint (2, size= (6)) >>> Y = Np.array ([1, 2, 3, 4, 4, 5]) >>> from Sklearn.naive_bayes Impo RT bernoullinb>>> CLF = BERNOULLINB () >>> clf.fit (X, Y) bernoullinb (alpha=1.0, binarize=0.0, Class_ Prior=none, Fit_prior=true) >>> print (Clf.predict (x[2)) [3]

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to how to use the naive Bayesian algorithm in Python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Introduction to how to use the naive Bayesian algorithm in Python

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support