Bayesian statistics are based on conditional probability, joint probability, so we start with probability, conditional probability, joint probability, then Bayesian theorem, and finally a Bayesian application--spam filtering
Probability: The likelihood of an event occurring, such as tossing a coin, the likelihood of a positive upward being 50%, and the probability of a dice point of 6 being 1/6. We use the notation as P (A)
Conditional probability: satisfies the possibility that the event occurs under certain conditions, for example, the probability of asking a person to buy clothes under the premise of buying the trousers, we use the notation as P (b| A), that is, the probability that event a occurs under B
Joint probabilities: Probabilities of simultaneous occurrence of multiple events, such as the probability P (ab) = P (a) p (B) of a coin toss twice, provided the event is independent of each other and, if not independent, the joint probability is P (ab) = P (a) p (b| A
When P (B) = P (b| A) indicates that the events are independent of each other.
Bayes theorem
We can calculate the conditional probabilities by using the joint probabilities, such as knowing P (AB) and P (a) and we want to know the probability of B occurring under the premise that event A occurs, p (b| a) = P (AB)/P (a), but if we want to calculate P (a| B) What's the probability?
Unfortunately, P (a| B) is not equal to P (b| A).
We know from the joint probability that the order of the product of probability can be exchanged i.e. p (AB) = P (BA), then two probabilities are expanded P (A) p (b| A) = P (B) p (a| B), we can clearly see what we want P (a| B) in which
P (a| B) = P (b| A) P (a)/P (B), which is the Bayes theorem.
P (A) is a priori probability, a probability that we assume before the calculation, such as the probability of a flip coin facing upward is 50%.
P (b| A) is the posterior probability, which is what we see after the data has been calculated
P (a| B) is a priori probability and a posteriori probability calculation, which is called likelihood degree
P (b) The probability of the occurrence of the event in any case, called the normalized constant P (b) = P (B1) p (b1| A1) + P (B2) p (b2| A2) .....
Bayesian estimation
It is possible to estimate a probability value of 0 with maximum likelihood, which will affect the calculation result of the posterior probability and make the classification biased. We use Bayesian estimation, that is, to add a λ correction parameter
Bayesian Formula P (b| A) = (P (AB) +λ)/(P (A) + sλ) λ>= 0 s indicates the frequency of each value of the random variable
Junk e-mail filtering
Spam is the inclusion of certain words, we only find these words, and calculated in terms of the premise of these words is the probability of spam and not spam, the probability of comparison probability
P (spam | w1,w2,w3,w4,w5 ...) and P (Not Junk Mail | w1,w2,w3,w4,w5 ...) The size. Can not directly know the probability of the condition, so the Bayesian play again, first write the joint probability, and then expand the
P (junk e-mail | w1,w2,w3,w4,w5 ...) P (w1,w2,w3,w4,w5 ...) = P (w1,w2,w3,w4,w5...| spam) p (spam), re-simplified
P (Junk Mail | w1,w2,w3,w4,w5 ...) = P (w1,w2,w3,w4,w5...| spam) p (spam)/P (w1,w2,w3,w4,w5 ...)
P (not spam | w1,w2,w3,w4,w5 ...) = P (w1,w2,w3,w4,w5...| non-spam) p (not spam)/P (w1,w2,w3,w4,w5 ...)
P (spam) is a priori probability 0.5, p (w1,w2,w3,w4,w5...| spam) is a posteriori probability, is calculated according to the data given, because two probabilities are divided by P (w1,w2,w3,w4,w5 ...), so the elimination of the final result is the following:
P (spam | w1,w2,w3,w4,w5 ...) = P (w1,w2,w3,w4,w5...| spam) p (junk e-mail)
P (Non-spam | w1,w2,w3,w4,w5 ...) = P (w1,w2,w3,w4,w5...| non-spam) p (not junk e-mail)
ImportNumPy as NPImportRepattern= Re.compile (r'\w+')classBayes (object):def __init__(self,wordlist): Self.wordslist=wordList self.hamcnt=0 self.spamcnt=0 Self.pham=Np.ones (len (self.wordslist)) Self.pspan=Np.ones (len (self.wordslist)) self.phamwordcnt= 2self.pspanwordcnt= 2defword_to_vector (self, word): Tempvector=Np.zeros (len (self.wordslist)) forLineinchPattern.findall (word):ifLineinchSelf.wordslist:tempvector[self.wordslist.index (line)]+ = 1.0returnTempvectordefset_tran_data (self, Word, Flag): Vector=Self.word_to_vector (Word.strip ())ifFlag:self.pham+=Vector self.phamwordcnt+=sum (Vector) self.hamcnt+ = 1.0Else: Self.pspan+=Vector self.pspanwordcnt+=sum (Vector) self.spamcnt+ = 1.0defclassifiy (self, word): Vector=self.word_to_vector (word) PA= self.hamcnt/(self.hamcnt +self.spamcnt) PB= self.spamcnt/(self.hamcnt +self.spamcnt) Panum= SUM (Np.log (self.pham/self.phamwordcnt) *Vector) Pbnum= SUM (Np.log (self.pspan/self.pspanwordcnt) *Vector)ifNp.log (PA) + Panum > Np.log (PB) +Pbnum:return1Else: return-1if __name__=="__main__": Hamlist= [item forIinchRange (1, 20) forIteminchOpen (R'C:\Users\Administrator\Desktop\machinelearninginaction\Ch04\email\ham\%s.txt'I'R'). ReadLines ()] Spamlist= [item forIinchRange (1, 20) forIteminchOpen (R'C:\Users\Administrator\Desktop\machinelearninginaction\Ch04\email\spam\%s.txt'I'R'). ReadLines ()] WordList1= [Word forLineinchHamlist forWordinchPattern.findall (line)ifLen (word) > 2] WordList2= [Word forLineinchSpamlist forWordinchPattern.findall (line)ifLen (word) > 2] Wordlist1.extend (wordList2) Temp=Bayes (List (set (WORDLIST1))) Tranhamlist= [Open (r'C:\Users\Administrator\Desktop\machinelearninginaction\Ch04\email\ham\%s.txt'I'R'). Read () forIinchRange (1, 20)] Transpamlist= [Open (r'C:\Users\Administrator\Desktop\machinelearninginaction\Ch04\email\spam\%s.txt'I'R'). Read () forIinchRange (1, 20)] forLineinchtranhamlist:temp. Set_tran_data (line, True) forLineinchtranspamlist:temp. Set_tran_data (line, False) testlist= [Open (r'C:\Users\Administrator\Desktop\machinelearninginaction\Ch04\email\ham\%s.txt'I'R'). Read () forIinchRange (21, 26)] forLineinchtestlist:PrintTemp.classifiy (line)
The data is the machine learning Combat 4th Chapter
Machine learning--the first chapter Bayes theorem and its application