Basic Formula
Bayes theorem: P (a| B) = P (b| A) *p (a)/P (B)
Suppose b1,b2 ... BN is independent of each other, then there are: P (b1xb2x...xbn| A) = P (b1| A) XP (b2| A) X...xp (bn| A) data (fictitious)
A1 A2 A3 A4 A5 B
1 1 1 1 3 no
1 1 1 2 2 soft 1 1 2 1 3 no
1 1 2 2 1 hard
1 2 1 1 2 no
1 2 1 2 3 Soft 1 2 2 1 1 no
1 2 2 2 2 hard
2 1 1 1 3 no
2 1 1 2 3 Soft
2 1 2 1 1 no
2 1 2 2 1 hard
2 2 1 1 2 No 2 2 1 2 3 soft
2 2 2 1 2 Soft
2 2 2 2 2 hard
3 1 1 1 1 No
3 1 1 2 2 soft
3 1 2 1 1 no
3 1 2 2 1 hard
3 2 1 1 3 soft
3 2 1 2 1 soft
3 2 2 1 2 no
3 2 2 2 3 No
Five features, one label algorithm step
1. According to the training set calculation probability:
(1) Calculated:
p (b= "hard"), p (b= "soft"), p (b= "no")
(2) Calculated:
p (a1= "1" | b= "hard"), P (a1= "2" | b= "hard"), P (a1= "3" | b= "hard");
P (a2= "1" | b= "hard"), P (a2= "2" | b= "hard"),...
P (a1= "1" | b= "soft"), P (a1= "2" | b= "soft"), P (a1= "3" | b= "soft");
P (a2= "1" | b= "soft"), P (a2= "2" | b= "soft"),...
P (a1= "1" | b= "No"), P (a1= "2" | b= "No"), P (a1= "3" | b= "No");
P (a2= "1" | b= "No"), P (a2= "2" | b= "No"),...
2. The probability of classifying test data according to the Bayes theorem:
calculation: P (b= "hard" |test_a), p (b= "soft" |test_a), p (b= "no" |test_a), the
probability of the largest category, is the classification result of naive Bayesian classifier.
Code Implementation
Def train (dataset,labels): Uniquelabels = set (labels) res = {} for label in Uniquelabels:res[label]
= [] Res[label].append (labels.count (label)/float (len (labels)) for I in Range (len (dataset[0))-1): Tempcols = [L[i] for L in DataSet if L[-1]==label] #获取Ai的值 uniqueValues = set (tempcols) Dict = {} for value in Uniquevalues:count = Tempcols.count (value) prob = Count/float (Labels.count (label)) #计算P (a|
B) Dict[value] = Prob res[label].append (dict) return res def test (Testvect,probmat): Hard = probmat[' hard '] soft = probmat[' soft '] no = probmat[' no '] Phard = hard[0] Psoft = soft[0] P No = no[0] for i in range (len (testvect)): If Testvect[i] in hard[i+1]: Phard *= hard[i+1][testvect [i]] Else:phard = 0 if testvect[i] in Soft[i + 1]: Psoft *= soft[i + 1][testVect[i]] Else:psoft = 0 if testvect[i] in No[i + 1]: pno *= no[i + 1][testvect[i ]] else:pno = 0 res[' hard '] = Phard res[' soft '] = Psoft res[' no '] = pno print Phard, Psoft, pno return Max (res, key=res.get)
Get Data
def loaddataset (filename):
fr = open (filename)
arrayolines = Fr.readlines ()
returnmat = []
labels = []< C5/>for line in Arrayolines: line
= Line.strip ()
listfromline = Line.split (" )
labels.append ( LISTFROMLINE[-1])
returnmat.append (listfromline) return
returnmat,labels
calculating probabilities based on training sets
The res here returns a dictionary that stores all of the probability values described in step 1 of the above algorithm. The dictionary structure is as follows:
{' hard ': [P (b= "hard"), {' 1 ': P (a1= "1" | b= "hard"), ' 2 ': P (a1= "2" | b= "hard"), ' 3 ': P (a1= "3" | b= "Hard")}, {' 1 ': P (a2= "1" | b= "hard"), ' 2 ': P (a2= "2" | b= "Hard")}, {' 1 ': P (a3= "1" | b= "hard"), ' 2 ':P (a3= "2" | b= "Hard")}, {' 1 ': P (a4= "1" | b= "hard"), ' 2 ':P (a4= "2" | b= "Hard")}, {' 1 ': P (a5= "1" | b= "hard"), ' 2 ': P (a5= "2" | b= "hard"), ' 3 ': P (a5= "3" | b= "Hard")}], ' soft ': [P (b= "soft"), {' 1 ': P (a1= "1" | b= "soft"), ' 2 ': P (a1= "2" | b= "soft"), ' 3 ': P (a1= "3" | b= "soft")}, {' 1 ': P (a2= "1" | b= "soft"), ' 2 ': P (a2= "2" | b= "soft")}, {' 1 ': P (a3= "1" | b= "soft"), ' 2 ':P (a3= "2" | b= "soft")}, {' 1 ': P (a4= "1" | b= "soft"), ' 2 ':P (a4= "2" | b= "soft")}, {' 1 ': P (a5= "1" | b= "soft"), ' 2 ': P (a5= "2" | b= "soft"), ' 3 ': P (a5= "3" | b= "soft")}], ' No ': [P (b= "no"), {' 1 ': P (a1= "1" | b= "No"), ' 2 ': P (a1= "2" | b= "No"), ' 3 ': P (a1= "3" | b= "No")}, {' 1 ': P (a2= "1" | b= "No"), ' 2 ': P (a2= "2" | b= "No")}, {' 1 ': P (a3= "1" | b= "No"), ' 2 ':P (a3= "2" | b= "No")}, {' 1 ': P (a4= "1" | b= "No"), ' 2 ':P (a4= "2" | b= "No")}, {' 1 ': P (a5= "1" | b= "No"), ' 2 ': P (a5= "2" | b= "No"), ' 3 ': P (a5= "3" | b= "No")}]}
In which, if the probability is 0, then the dictionary does not contain the key value pairs. Calculating the classification probability of test samples
Test Results
DataSet, labels = loaddataset ("Dataset.txt")
Probmat = Train (dataset,labels)
res = Test ([' 3 ', ' 1 ', ' 2 ', ' 2 ', ' 1 '] , Probmat)
Print Res