PGM: Naive Bayesian model of Bayesian network naive Bayes

Source: Internet
Author: User

http://blog.csdn.net/pipisorry/article/details/52469064

the use of independent nature

Conditional parameterization and conditional independence assumptions are combined to produce a very compact representation of the high-dimensional probability distributions.

Independence of random variables

[PGM: The basic knowledge of probability theory : The use of independent nature]

Conditional Parameterization Method


Note:p (I), p (S | i0), p (S | i1) are all two-item distributions, all requiring only one parameter.

Phi Blog



naive Bayesian model naive Bayes
Student example of naive Bayesian model

{This example is a good illustration of what a naive Bayesian network model is, followed by its generalized model and application examples of classification}

Problem description

Factor representation of the model

notation: I denotes IQ; S (at) represents SAT scores; G (Rade) represents certain course scores.


Note: This representation is represented in 3 two-item representations and 2 three-item representations.

Benefits of factor representation

[Theory and technology of probability graph model [(US] Kohler, Friedman]

Phi Blog



naive Bayesian general modelGeneralized definition of naive Bayesian model


Note: Corresponding to the student example above, that is, when the class variable C (IQ I in the example) is determined, the feature of the class (grade and sat in the example) is independent (in fact, the tail-to-tail structure of the Bayesian network).

Bayesian networks of naive Bayesian models:


Factor decomposition and parameters of naive Bayesian model


Using naive Bayesian models for classification


That is to say, naive Bayes classifier is mainly training parameters P (c) {each independent P (CI)} and P (x|c) {each independent P (XI|CI) =num (xi=i, ci=i)/num (ci=i)} (These parameters can be calculated by the training data directly through the frequency (Mle method)), by taking the largest p (c|new_x) to predict the category of new_x.


Advantages and disadvantages of naive Bayesian classification algorithm

Pros: Still effective with less data, can handle multiple categories of problems

Cons: Sensitive to the way the input data is prepared

Applicable data type: Nominal type data

The naïve Bayesian method does not need structural learning, the establishment of network structure is very simple, experimental results and practice proves that its classification effect is better.

But in the practical application field, the naïve Bayesian network classifier has the strong qualification that the individual attributes are independent of each other's hypothesis is difficult to establish. We should understand this independence in a broad sense, that is, the conditional independence between the attribute variables is that the dependency between the attribute variables is negligible relative to the dependency between the attribute variable and the class variable, which is one of the main reasons why the optimal range of naive Bayesian network classifier is much larger than imagined. Naive Bayesian classifier with simple structure and good performance by people's attention, it is one of the best classifier. In theory it is optimal under the condition of satisfying its qualification, but it has strong qualification, can try to weaken its limited condition to enlarge the optimal range and produce better classifier. Naive Bayes classifier can be extended to generalized naive Bayes classifier.

Python implementation of naive Bayesian classification algorithm

#!/usr/bin/env python
#-*-Coding:utf-8-*-
"""
__title__ = ' naive Bayesian algorithm (also applicable to multi-class classification) '
__author__ = ' Pika '
__mtime__ = ' 16-5-23 '
__email__ = ' [email protected] '
# code is far away from bugs with the god animal protecting
I love animals. They taste delicious.
"""
Import NumPy as NP

Train_file = R './trainingdata.txt '
Test_file = R './testingdata.txt '


def train_naive_bayes (x, y):
‘‘‘
Training parameters: P (c) {contains each independent p (CI)} and P (x|c) {contains each independent p (XI|CI)}
‘‘‘
P_c = {} # p (c) = {ci:p (CI)}
P_x_cond_c = {} # p (x|c) = {ci: [P (XI|CI)]}
For L in Np.unique (y):
# label L, x=1 [XI = 1] when the probability array[p (xi=1|c=l)]; Then 1-array[p (xi=1|c=l)] is array[p (xi=0|c=l)]
P_x_cond_c[l] = X[y = = L].sum (0)/(y = = l). SUM ()
P_c[l] = (y = = l). The probability of sum ()/Len (y) # p (c=l)
Print ("ΘC: {}\n". Format (P_c))
Print ("θa1=0| C: {}\n ". Format ({a[0]: 1-a[1][0] for A in P_x_cond_c.items ()}))
Print ("θa1=1| C: {}\n ". Format ({a[0]: a[1][0] for A in P_x_cond_c.items ()}))
Return P_c, P_x_cond_c


def predict_naive_bayes (P_c, P_x_cond_c, new_x):
‘‘‘
To predict the label of each new individual x, return a label single value
‘‘‘
# new_x probability array under category L
p_l = [(L, P_c[l] * (Np.multiply.reduce (p_x_cond_c[l] * new_x + (1-P_X_COND_C[L)) * (1-new_x)))))
P_c.keys ()]
P_l.sort (Key=lambda x:x[1], reverse=true) # new_x probability in category L array sorted by probability size
return P_l[0][0] # returns the maximum corresponding label of the probability


if __name__ = = ' __main__ ':
Tdata = Np.loadtxt (Train_file, Dtype=int)
X, y = tdata[:, 1:], tdata[:, 0]
P_c, P_x_cond_c = Train_naive_bayes (x, y)

Tdata = Np.loadtxt (Test_file, Dtype=int)
X, y = tdata[:, 1:], tdata[:, 0]
predict = [Predict_naive_bayes (P_c, P_x_cond_c, xi) for Xi, Yi in zip (x, y)]
Error = (Y! = predict). SUM ()/len (y)
Print ("Test error: {}\n". Format (Error))

[Machine Learning Classic algorithm and Python implementation---naive Bayesian classification and its application in text categorization and spam detection]

Phi Blog

from:http://blog.csdn.net/pipisorry/article/details/52469064

Ref: [Probability Graph model theory and technology [(US) Kohler] Friedman]*


PGM: Naive Bayesian model of Bayesian network naive Bayes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.