PGM: Naive Bayesian model of Bayesian network naive Bayes

Last Update:2016-09-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

http://blog.csdn.net/pipisorry/article/details/52469064

the use of independent nature

Conditional parameterization and conditional independence assumptions are combined to produce a very compact representation of the high-dimensional probability distributions.

Independence of random variables

[PGM: The basic knowledge of probability theory : The use of independent nature]

Conditional Parameterization Method

Note:p (I), p (S | i0), p (S | i1) are all two-item distributions, all requiring only one parameter.

Phi Blog

naive Bayesian model naive Bayes
Student example of naive Bayesian model

{This example is a good illustration of what a naive Bayesian network model is, followed by its generalized model and application examples of classification}

Problem description

Factor representation of the model

notation: I denotes IQ; S (at) represents SAT scores; G (Rade) represents certain course scores.

Note: This representation is represented in 3 two-item representations and 2 three-item representations.

Benefits of factor representation

[Theory and technology of probability graph model [(US] Kohler, Friedman]

Phi Blog

naive Bayesian general modelGeneralized definition of naive Bayesian model

Note: Corresponding to the student example above, that is, when the class variable C (IQ I in the example) is determined, the feature of the class (grade and sat in the example) is independent (in fact, the tail-to-tail structure of the Bayesian network).

Bayesian networks of naive Bayesian models:

Factor decomposition and parameters of naive Bayesian model

Using naive Bayesian models for classification

That is to say, naive Bayes classifier is mainly training parameters P (c) {each independent P (CI)} and P (x|c) {each independent P (XI|CI) =num (xi=i, ci=i)/num (ci=i)} (These parameters can be calculated by the training data directly through the frequency (Mle method)), by taking the largest p (c|new_x) to predict the category of new_x.

Advantages and disadvantages of naive Bayesian classification algorithm

Pros: Still effective with less data, can handle multiple categories of problems

Cons: Sensitive to the way the input data is prepared

Applicable data type: Nominal type data

The naïve Bayesian method does not need structural learning, the establishment of network structure is very simple, experimental results and practice proves that its classification effect is better.

But in the practical application field, the naïve Bayesian network classifier has the strong qualification that the individual attributes are independent of each other's hypothesis is difficult to establish. We should understand this independence in a broad sense, that is, the conditional independence between the attribute variables is that the dependency between the attribute variables is negligible relative to the dependency between the attribute variable and the class variable, which is one of the main reasons why the optimal range of naive Bayesian network classifier is much larger than imagined. Naive Bayesian classifier with simple structure and good performance by people's attention, it is one of the best classifier. In theory it is optimal under the condition of satisfying its qualification, but it has strong qualification, can try to weaken its limited condition to enlarge the optimal range and produce better classifier. Naive Bayes classifier can be extended to generalized naive Bayes classifier.

Python implementation of naive Bayesian classification algorithm

#!/usr/bin/env python
#-*-Coding:utf-8-*-
"""
__title__ = ' naive Bayesian algorithm (also applicable to multi-class classification) '
__author__ = ' Pika '
__mtime__ = ' 16-5-23 '
__email__ = ' [email protected] '
# code is far away from bugs with the god animal protecting
I love animals. They taste delicious.
"""
Import NumPy as NP

Train_file = R './trainingdata.txt '
Test_file = R './testingdata.txt '

def train_naive_bayes (x, y):
‘‘‘
Training parameters: P (c) {contains each independent p (CI)} and P (x|c) {contains each independent p (XI|CI)}
‘‘‘
P_c = {} # p (c) = {ci:p (CI)}
P_x_cond_c = {} # p (x|c) = {ci: [P (XI|CI)]}
For L in Np.unique (y):
# label L, x=1 [XI = 1] when the probability array[p (xi=1|c=l)]; Then 1-array[p (xi=1|c=l)] is array[p (xi=0|c=l)]
P_x_cond_c[l] = X[y = = L].sum (0)/(y = = l). SUM ()
P_c[l] = (y = = l). The probability of sum ()/Len (y) # p (c=l)
Print ("ΘC: {}\n". Format (P_c))
Print ("θa1=0| C: {}\n ". Format ({a[0]: 1-a[1][0] for A in P_x_cond_c.items ()}))
Print ("θa1=1| C: {}\n ". Format ({a[0]: a[1][0] for A in P_x_cond_c.items ()}))
Return P_c, P_x_cond_c

def predict_naive_bayes (P_c, P_x_cond_c, new_x):
‘‘‘
To predict the label of each new individual x, return a label single value
‘‘‘
# new_x probability array under category L
p_l = [(L, P_c[l] * (Np.multiply.reduce (p_x_cond_c[l] * new_x + (1-P_X_COND_C[L)) * (1-new_x)))))
P_c.keys ()]
P_l.sort (Key=lambda x:x[1], reverse=true) # new_x probability in category L array sorted by probability size
return P_l[0][0] # returns the maximum corresponding label of the probability

if __name__ = = ' __main__ ':
Tdata = Np.loadtxt (Train_file, Dtype=int)
X, y = tdata[:, 1:], tdata[:, 0]
P_c, P_x_cond_c = Train_naive_bayes (x, y)

Tdata = Np.loadtxt (Test_file, Dtype=int)
X, y = tdata[:, 1:], tdata[:, 0]
predict = [Predict_naive_bayes (P_c, P_x_cond_c, xi) for Xi, Yi in zip (x, y)]
Error = (Y! = predict). SUM ()/len (y)
Print ("Test error: {}\n". Format (Error))

[Machine Learning Classic algorithm and Python implementation---naive Bayesian classification and its application in text categorization and spam detection]

Phi Blog

from:http://blog.csdn.net/pipisorry/article/details/52469064

Ref: [Probability Graph model theory and technology [(US) Kohler] Friedman]*

PGM: Naive Bayesian model of Bayesian network naive Bayes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

PGM: Naive Bayesian model of Bayesian network naive Bayes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

PGM: Naive Bayesian model of Bayesian network naive Bayes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support