Bayes theorem
Bayesian theorem is a theorem that corrects the subjective judgment (i.e. the priori probability) of the probability distribution of the observed value, which plays an important role in probability theory.
Prior probability distribution (edge probability) refers to the probability distribution based on subjective judgment rather than sample distribution, and the posterior probability (conditional probability) is a conditional probability distribution based on the distribution of samples and the prior probability distributions of unknown parameters.
Bayesian formula:
P(A∩B) = P(A)*P(B|A) = P(B)*P(A|B)
To deform:
P(A|B)=P(B|A)*P(A)/P(B)
which
P(A)is a priori probability or marginal probability, known as a priori, because it does not consider the B factor.
P(A|B)is known as B after the occurrence of a conditional probability, also known as a posteriori probability.
P(B|A)is known as the condition probability of B after the occurrence, also known as the posterior probability of B, here is called the likelihood degree.
P(B)is a priori probability or edge probability of B, which is called a normalized constant.
P(B|A)/P(B)Called the standard likelihood degree.
Naive Bayesian classification (Naive Bayes)
Naive Bayes classifier assumes that the condition is independent of the conditions when estimating the class conditional probabilities.
First define
x = {a1,a2,...}As a sample vector, A is a feature attribute
div = {d1 = [l1,u1],...}A division of characteristic attributes
class = {y1,y2,...}Category to which the sample belongs
Algorithm Flow:
(1) Calculate a priori probability for each category by the distribution of the sample concentration categoryp(y[i])
(2) Calculate the frequency of each feature attribute division under each categoryp(a[j] in d[k] | y[i])
(3) Calculate each sample'sp(x|y[i])
p(x|y[i]) = p(a[1] in d | y[i]) * p(a[2] in d | y[i]) * ...
All characteristic attributes of a sample are known, so the interval D to which the feature attribute belongs is known.
can be obtained by (2) determining p(a[k] in d | y[i]) the value p(x|y[i]) .
(4) by Bayes theorem:
p(y[i]|x) = ( p(x|y[i]) * p(y[i]) ) / p(x)
Because the denominator is the same, only the numerator is counted.
p(y[i]|x)is the probability that the observed sample belongs to the classification Y[i], and finds the classification of the maximal probability corresponding to the classification result.
Example:
Import data sets
{a1 = 0, a2 = 0, C = 0} {a1 = 0, a2 = 0, C = 1}
{a1 = 0, a2 = 0, C = 0} {a1 = 0, a2 = 0, C = 1}
{a1 = 0, a2 = 0, C = 0} {a1 = 0, a2 = 0, C = 1}
{a1 = 1, a2 = 0, C = 0} {a1 = 0, a2 = 0, C = 1}
{a1 = 1, a2 = 0, C = 0} {a1 = 0, a2 = 0, C = 1}
{a1 = 1, a2 = 0, C = 0} {a1 = 1, a2 = 0, C = 1}
{a1 = 1, a2 = 1, C = 0} {a1 = 1, a2 = 0, C = 1}
{a1 = 1, a2 = 1, C = 0} {a1 = 1, a2 = 1, C = 1}
{a1 = 1, a2 = 1, C = 0} {a1 = 1, a2 = 1, C = 1}
{a1 = 1, a2 = 1, C = 0} {a1 = 1, a2 = 1, C = 1}
A priori probability of calculating categories
P (C = 0) = 0.5
P (C = 1) = 0.5
Calculate the conditional probabilities for each feature attribute:
P (a1 = 0 | C = 0) = 0.3
P (a1 = 1 | C = 0) = 0.7
P (a2 = 0 | C = 0) = 0.4
P (A2 = 1 | C = 0) = 0.6
P (a1 = 0 | C = 1) = 0.5
P (a1 = 1 | C = 1) = 0.5
P (a2 = 0 | C = 1) = 0.7
P (A2 = 1 | C = 1) = 0.3
Test Sample:
x = {A1 = 1, a2 = 2}
P (x | C = 0) = p (A1 = 1 | C = 0) * p (2 = 2 | C = 0) = 0.3 * 0.6 = 0.18
P (x | C = 1) = p (A1 = 1 | C = 1) * p (a2 = 2 | C = 1) = 0.5 * 0.3 = 0.15
Calculate P (C | x) * p (x):
P (C = 0) * p (x | C = 1) = 0.5 * 0.18 = 0.09
P (C = 1) * p (x | C = 2) = 0.5 * 0.15 = 0.075
So we think the test sample belongs to type C1
Python implementation
The naïve Bayesian classifier's training process is calculated (1), (2) in the probability table, the application process is calculated (3), (4) and the maximum value is searched.
or using the original interface for class encapsulation:
From NumPy import *class naivebayesclassifier (object): Def __init__ (self): Self.datamat = List () Self . Labelmat = List () Self.plabel1 = 0 Self.p0vec = List () Self.p1vec = List () def loaddataset (self,f Ilename): FR = open (filename) for line in Fr.readlines (): Linearr = Line.strip (). Split () Dataline = list () for I in LineArr:dataLine.append (float (i)) label = Dataline.pop () # Pop the last column referring to label Self.dataMat.append (dataline) self.labelMat.append (int ( LABEL) def train (self): Datanum = Len (self.datamat) featurenum = Len (self.datamat[0]) Self.plabel 1 = SUM (self.labelmat)/float (datanum) p0num = zeros (featurenum) P1num = zeros (featurenum) P0denom = 1 .0 P1denom = 1.0 for I in range (datanum): if self.labelmat[i] = = 1:p1num + self. Datamat[i] P1denom + = SUM (Self.datamat[i]) Else:p0num + = Self.datamat[i] P0denom + = SUM (s Elf.datamat[i]) Self.p0vec = P0num/p0denom Self.p1vec = p1num/p1denom def classify (self, data): P1 = Reduce (lambda x, y:x * y, Data * self.p1vec) * Self.plabel1 p0 = reduce (lambda x, y:x * y, Data * Self.p0vec) * (1.0-SELF.PLABEL1) if p1 > P0:return 1 else:return 0 def test (self): Self.loaddataset (' TestNB.txt ') Self.train () print (Self.classify ([1, 2])) if __name__ = = ' __main__ ': NB = Naivebayesclassifier () nb.test ()
Matlab
The standard toolbox of MATLAB provides support for naive Bayesian classifiers:
trainData = [0 1; -1 0; 2 2; 3 3; -2 -1;-4.5 -4; 2 -1; -1 -3];group = [1 1 -1 -1 1 1 -1 -1]‘;model = fitcnb(trainData, group)testData = [5 2;3 1;-4 -3];predict(model, testData)
fitcnbUsed to train a model to predict predict.
Naive Bayesian classifier and Python implementation