Application of Naive Bayes algorithm in spam filtering, Bayesian Spam
I recently wrote a paper on Big Data Classification (SPAM: My tutor reminds me every day), so I borrowed several books on big data from the library. Today, I read spam in "New Internet Big Data Mining" (if you are interested, you can take a look), which reminds me that I saw a famous enterprise interview question in the 1280 community yesterday, "in real-time game communication, how do I filter those advertisements? ". At that time, I thought about keyword filtering, but I didn't think about it.
In fact, spam filtering and AD filtering are the most commonly used Naive Bayes algorithms.
Bayesian theorem is A theorem about the conditional probability (or edge probability) of random events A and B.
(See Wikipedia http://zh.wikipedia.org/wiki/%E8%B4%9D%E5%8F%B6%E6%96%AF%E5% AE %9A%E7%90%86)
By studying a large number of identified spam and normal emails, we can determine the possibility of spam based on the probability comparison of the occurrence of the same words in the two emails. The advantage is high accuracy, but the disadvantage is that a large amount of historical data is required.
Naive Bayes algorithm problems
Use this to compile the software? The tips I gave you are also my graduation project. You can use excel to implement your computing. This is more convenient than software, then you are using VB to interact with your excel file. There are not many specific applications in life
I am using Naive Bayes algorithm for Information Filtering and need a training set. I hope someone can provide the following
Today, June 4 is passed. No one will answer you. It's better for LZ to share it with me ~