I have read the naive Bayes classifier over the past two days. Here I will take a simple note based on my own understanding and sort out my ideas.
I. Introduction
1. What is a naive Bayes classifier?Naive Bayes ClassifierIt is
indicate that the word appears several times in the document. This creates a training set.Now the naive Bayes method requires that the left part of the equals sign in this famous formula.And the meaning of the left part of the equal sign is essentially, I got a document with so many words in it, what is the probability that my document is classified as Category 1 under this condition? What is the probabili
attention to the fact that it is possible to encounter more than one classification probability in the actual operation or the probability of each classification is 0, at this time it is generally random to select a classification as the result. But sometimes it should be treated with care, such as using Bayesian to identify spam, if the probability is the same, even if the two probability difference is not large, it should be treated as non-spam, because the failure to identify the impact of s
Main ideas:
1. Have a corpus
2. Count the frequency of occurrence of each word and use it as a naive Bayes candidate.
3. Example:
The corpus contains phrases such as China, the people, the Chinese, and the republic.
Input: Chinese people love the People's Republic of China;
Use max for word splitting (score obtained from various distributions );
For example: solution1: Chinese people _ all Chinese people _
Naive Bayesian Classification (NBC) is the most basic classification method in machine learning, and it is the basis of the comparison of classification performance of many other classification algorithms, and the other algorithms are based on NBC in evaluating performance. At the same time, for all machine learning methods, there is the idea of Bayes statistics everywhere.Naive
Outlook
Temperature
Humidity
Windy
Play
Yes
No
Yes
No
Yes
No
Yes
No
Yes
No
Sunny
2
3
Hot
2
2
High
3
4
False
6
2
9
5
Overcast
4
0
Mild
4
2
Normal
6
1
Trur
3
3
Rainy
3
2
Cool
3
1
As shown in the above table, we will calculate whether to play when the conditions are sunny, cool,
probability of B.
Bayesian FormulaBayesian formula provides a method to calculate the posterior probability P (B | A) from the prior probability P (A), P (B), and P (A | B ).
Bayesian theorem is based on the following Bayesian formula:
P (A | B) increases with the growth of P (A) and P (B | A), and decreases with the growth of P (B, that is, if B is more likely to be observed when it is independent of A, then B's support for a is smaller.
Naive
Example of Naive Bayes algorithm and Bayesian exampleApplication of Bayesian
The famous application of Bayesian classifier for spam filtering is spam filtering, if you want to learn more about this, you can go to hacker and painter or the corresponding chapter in the beauty of mathematics. For the basic implementation of Bayesian, see the dataset in two folders, they are normal mails and spam mails, and e
] [,2] setosa 0.246 0.1053856 versicolor 1.326 0.1977527 virginica 2.026 0.2746501It is the conditional probability of the feature petal. Width. In this Bayesian implementation, the feature is numeric data (and there is also a fractional part). Here we assume that the probability density conforms to the Gaussian distribution. For example, for the feature petal. width, the probability of being setosa complies with the Gaussian distribution where the mean is 0.246 and the standard variance is 0.10
distribution characteristics, so that the wrong data distribution estimates. In this case, the real test set on the wrong mess (this phenomenon called fitting). But also can not use too simple model, otherwise when the data distribution is more complex, the model is not enough to depict the data distribution (reflected in the training set the error rate is very high, this phenomenon is less than fit). Over-fitting indicates that the model used is more complex than the real data distribution, an
Part 1 Naive BayesOr the junk e-mail classification problem, which was mentioned in the last lesson, is divided into two kinds of event models:1.1. Multivariable Bernouli Event Model"This is the last lesson.Maintain a long and long long dictionaryFor a sample (x, y), X[i]=0or1 Indicates whether dictionary I have appeared in a sample message, Y=0or1 indicates that the sample is spamIn this model, Xi takes a value of only 0or1, so $x _{i} | y$ is Bernou
training samples. For example, y = 1 has M1 and training samples have M, then P (y = 1) = m1/m. However, I still cannot figure out the p (x | Y) computation.
Naive Bayes hypothesis: P (x1, x2 ,.., XN | y) = P (X1 | Y )... P (XN | y) (x1, x2 ,..., XN is the component of X, that is, the condition is independent. When I! When J is used, P (XI | y, XJ) = P (XI | Y). If y is specified, the occurrence of Xi is
user requests a request, we need to traverse the probability of each grid in the computed database and return the center point of the maximum probability grid. Assuming that our lattice is 10*10 meters in size, then all the grid in Beijing will have 160 million lattice, traverse computation overhead is very huge. A method to improve the computational efficiency is to solve the approximate spatial range based on the user's signal vectors, and then calculate the probability of each lattice in the
probability of an object (that is, the probability that the object belongs to a certain class), and then select the class with the maximum posteriori probability as the class to which the object belongs. At present, there are four kinds of Bayesian classifiers: Naive Bayesian classification, TAN (tree Augmented Bayes Network) algorithm, BAN (BN augmented Naive
1. OverviewNaive Bayesian classification is a Bayesian classifier, Bayesian classification algorithm is a statistical classification method, using probability statistical knowledge classification, the classification principle is to use the Bayesian formula based on the prior probability of an object to calculate the posteriori probability (that the object belongs to a certain class of probability), Then select the class that has the maximum posteriori probability as the class to which the object
Compared to "dictionary-based Analysis," machine learning "does not require a large number of annotated dictionaries, but requires a large number of tagged data, such as:Or the following sentence, if its label is:Quality of service-medium (total three levels, good, medium and poor)╮ (╯-╰) ╭, which is machine learning, trains a model with a large number of tagged data,Then you enter a comment to determine the label levelNingxin Reviews National Day activities, with 62 credit card can be 6.2 yuan
This paper mainly introduces the knowledge of how to use naive Bayesian algorithm in Python. Has a good reference value. Let's take a look at the little series.
Again, here's why the title is "using" instead of "Implementing":
First, professionals provide algorithms that are higher than our own algorithms, whether efficient or accurate.
Secondly, for those who are not good at maths, it is very painful to s
setofwords2vecmn (vocablist,inputset): returnvec=[ 0]*len (vocablist) #创建一个其中所含元素都为0的向量 for word in Inputset:if word in vocablist:returnvec[vocabl Ist.index (word)]+=1 return returnvec# naive Bayesian classifier training function def Trainnbo (trainmatrix,traincategory): Numtraindocs=len ( Trainmatrix) Numwords=len (trainmatrix[0]) pabusive=sum (traincategory)/float (Numtraindocs) p0Num=ones (numWords);p 1 Num=ones (numwords) #计算p (w0|1) p (w1|1), av
============================================================================================ "Machine Learning Combat" series blog is Bo master reading " Machine learning Combat This book's notes, including the understanding of the algorithm and the Python code implementation of the algorithmIn addition, bloggers here have the machine to learn the actual combat this book all the algorithm source code and algorithm used to file, there is need to messag
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.