<span style= "font-family: ' Helvetica neue ', Helvetica, Arial, Sans-serif; Background-color:rgb (255, 255, 255); >
</span>
<span style= "font-family: ' Helvetica neue ', Helvetica, Arial, Sans-serif; Background-color:rgb (255, 255, 255); >1, Intro </span>
Naive Bayesian method is a method of calculating the posterior probability using a priori probability, in which the simple meaning actually refers to a hypothetical condition, which is explained in the following example. I think that pure mathematical deduction is of course its rigor, the characteristics of logic, but for me and other non-mathematics professionals, for each step is not a thorough understanding of the derivation, I will start with an example, similar to the application of the problem, explain the naïve Bayes classifier, I hope that the understanding of the formula to increase the visualization of the scene. 2. Example
Recently, the "Little Apple" is very hot, so we use the Apple as an example to say, suppose you can use three characteristics to describe an apple, respectively, "size", "Weight" and "color", where the value of "size" is small, large, "weight" of the value of light, heavy, "color" value is red, green. Of the three features described in the Apple, the taste of the Apple classification, the desirable value is good, bad.
Naive Bayesian classifier to solve the following problem, known apple taste good and bad probability, so if given a group of apples characteristics, then the apple taste good and bad probability of how much. This is a typical inverse probability problem.
Size sizes Big and small
Weight (weight) light weight lightly
Colour (color) red, red, green, red, green
Taste (taste) good good bad or bad good
The above gives a description of the characteristics of the 6 apples and their taste, a large and heavy red apple, whether it can be estimated that the taste is good or bad.
Here we first explain the meaning of simplicity, and simplicity is the assumption that the three features that describe Apple are independent of each other. This assumption will be of great convenience to the calculations that follow. But there are certain people who would think, for this example, that this assumption is not tenable, and that the size and weight we all feel are two positively correlated features. Yes, simple assumptions are more difficult to satisfy in the real world, but in practice, the correct rate of predictions based on this assumption is in an acceptable range. 3. Basic methods
P (a∣b) indicates that, in the case of determining B, the probability that event a occurs, and in reality, we may be more concerned about P (b∣a) but only directly get P (a∣b), at which point we need a tool to convert P (a∣b) and P (b∣a) to each other, and the Bayes theorem is such a formula, The Bayes theorem is given below:
P (b| A) =p (a| b) p (b) p (A)
For the Apple classification problem, there are three features F = {F1, F2, F3}, two classifications C = {c1, c2}, according to the Bayesian formula with the given characteristics, the probability of the feature is CI
P (ci|f1f2f3) =p (F1F2F3|CI) p (CI) p (f1f2f3)
The CI that obtains the maximum value of the top-type is the result of classification, because P (F1F2F3) is constant for a given training set, then it is converted to
P (F1F2F3∣CI) p (CI)
The maximum value.
The Naïve Bayes hypothesis here shows that, because the eigenvalues are independent of each other, the upper formula can be transformed into
P (F1∣CI) p (F2∣CI) p (f3∣ci) p (CI)
The whole problem becomes the CI that asks for the maximum value of the upper form, and each item in the formula can be obtained from the training focus.
Finally, we discuss the following Laplace calibration, if a certain characteristic value in the training set appears 0 times, then the formula we discussed above is meaningless, thinking that for all types the result is 0. Of course, the choice of training set to avoid this situation, but if not to avoid the need for Laplace calibration. In fact, it is very simple to put all the occurrences of the characteristics of the number of 1 plus, that is Laplace calibration. 4, R language implementation
</pre><pre name= "code" class= "plain"
# Naive Bayes Library (PLYR) library (reshape2) #1, based on the training set to create a naïve Bayesian classifier #1.1, the probability of generating categories # # Calculate the probability of a class in the training set D, that is, p{c_i} # # Input: Traindat A training set, for the Data box # # strClassName indicates that the training set name is strClassName classified result # # output: Data box, p{c_i} collection, category name | probability (column name is prob) class_prob <-function ( Traindata, strClassName) {#训练集样本数 length.train <-nrow (traindata) dtemp <-ddply (Traindata, StrClassName , "Nrow") #ddply用于对trainData进行分组统计 #统计taste的频数 dtemp <-ddply (dtemp, strClassName, mutate, prob = Nrow/leng Th.train) #继上一步计算频率 Dtemp[,-2]} # #1.2, generate each category, the characteristics of the probability of a different value # # calculation training Set D, generate each category, the characteristics of different values of the probability, that is, p{fi_ci} # #输入: Traindata Training
Set, for the Data box # # strClassName indicates that the training set name is strClassName classified as the result, all remaining columns are considered to be eigenvalues # #输出: Data box, P{FI_CI} collection, category name | feature name | feature value | probability (column name is prob) Feature_class_prob <-function (Traindata, strclassname) {#横表转成纵表 Data.melt <-(melt, id = C (traindata sname) #统计频数 AA <-ddply (Data.melt, C (strclassname, "variable", "value"), "Nrow") #计算每一种特征对应的分类的频数 BB <-ddply (AA, C (strClassName, "VariaBle "), mutate, sum = SUM (nrow), prob = nrow/sum) #增加列名 colnames (BB) <-C (" Class.name "," Fe Ature.name "," Feature.value "," Feature.nrow "," Feature.sum " , "Prob") # returns results bb[, C (1,2,3,6)]} #feature_class_prob (Iris, "species") #以上创建了朴素贝叶斯分类器 # #
Using the generated Naive Bayes classifier to predict # #使用生成的朴素贝叶斯进行预测P {fi_ci} # #输入: oneobs Data box, expected sample, format as feature name | feature Value # # PC Data box, the probability of the feature taking different values under the training set, p{c_i} category name | probability # # PFC Data box, under each category, eigenvalues go to different values of probabilities, i.e. p{fi_ci} pre_class <-function (oneobs, PC, PFC) {colnames (oneobs) <-C ("Feature.name "," Feature.value ") colnames (PC) <-c (" Class.name "," Prob ") colnames (PFC) <-C (" Class.name "," Feature.name ", "Feature.value", "prob") #取出特征的取值的条件概率 Feature.all <-Join (Oneobs, PFC, by = C ("Feature.name", "Feature.va Lue "), type=" inner ") #取出特征取值的条件概率的连乘 feature.prob <-ddply (Feature.all,.
Class.name), summarize, Prob_fea = prod (prob)) #prod是连乘函数#取出类别的概率 Class.all <-Join (Feature.prob, pc, by = "Class.name", type = "inner") #输出结果 ddply (Class.all,. Class.name), mutate, Pre_prob = Prob_fea * prob) [, C (1,4)]} # #3, data test #训练集 train.apple <-data.frame (size = C ("Large
"," small "," big "," big "," small "," small "), Weight = C (" Light "," heavy "," light "," Light "," heavy "," light "), color = C (" Red "," Red "," red "," green "," Red "," green "), Taste = C ("Good", "good", "bad", "bad", "bad", "good") #测试集 oneobs <-data.frame (feature.name = C ("size"), "Weight", "color"), Feature.value = C ("Large", "heavy", "Red") #预测测试集 pc <-Class_prob (train.apple, "taste") PFC <-FE
Ature_class_prob (Train.apple, "taste") Pre_class (Oneobs, PC, PFC)
Result is
Class.name Pre_prob
1 Bad 0.07407407
2 Good 0.03703704
It can be seen that the taste of the apple is bad 5, naive Bayesian classification summary
1, belong to supervised learning (with training set);
2, mainly dealing with discrete types of data, if the continuous data can be first discretization;
3, the training set features to be as complete as possible, if there is a lack of pretreatment (Laplace calibration);
4. The assumption that the eigenvalues are independent of each other is generally not satisfied in practical problems, but the predictions based on this hypothesis are acceptable.
About the other naive Bayesian introduction is visible:
Http://www.ruanyifeng.com/blog/2013/12/naive_bayes_classifier.html
Http://www.cnblogs.com/leoo2sk/archive/2010/09/17/1829190.html
In addition, the naive Bayesian course can also be read: Andrew ng machine Learning Notes (v)--generative learning algorithm and naive Bayesian algorithm