[Machine learning & Data mining] naive Bayesian mathematical principles

Last Update:2015-06-24 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Preparation:

(1) Prior probability: Based on past experience and analysis of the probability, that is, the usual probability, in the full probability of the expression is "from the result of the fruit"

(2) Posterior probability: refers to the probability of re-correcting after obtaining the "result" information, usually the conditional probability (but not all of the conditional probability is the posterior probability), in the Bayesian formula as a result of the "fruit seeking cause"

For example: processing a batch of parts, a processing 60%, b processing 40%, a 0.1 of the probability of processing a defective, b 0.15 of the probability of processing a defective, to find a part is the probability of a defective is a priori probability, has learned that a part is defective, to find this part is a or b processing probability is a posteriori probability

(3) Full probability formula: Set E for random test, b1,b2, .... BN is an incompatible random event of E, and P (Bi) >0 (I=1,2....N), B1 u B2 u .... U Bn = S, if A is an E event, then there is

P (A) = P (B1) p (a| B1) +p (B2) P (a| B2) +.....+p (Bn) P (a| Bn)

(4) Bayesian formula: set E for randomized trials, B1,B2, .... BN is an incompatible random event of E, and P (Bi) >0 (I=1,2....N), B1 u B2 u .... U Bn = s,e event A satisfies P (A) >0, then there is

(5) Conditional probability formula: P (a| b) = P (AB)/p (b)

(6) Maximum likelihood estimation: Maximum likelihood estimation in machine learning to minimize empirical risk, (discrete distribution) general flow: Determine likelihood function (the joint probability distribution of a sample), this function is about the parameter to be estimated the function, and then take the logarithm, and then derivative, in the case that the derivative equals 0, Evaluates the value of the parameter, which is the maximum likelihood estimate of the parameter

Note: Empirical risk: In the measurement of a model is good or bad, the loss function is introduced, the common loss function is: 0-1 loss function, square loss function, absolute loss function, logarithmic loss function, and the risk function (expected risk) is the expectation of the loss function, the expected risk is about the joint distribution of the theoretical expectations, However, the joint distribution of theory can not be obtained, only by using the sample to estimate the expectation, so introducing empirical risk, empirical risk is the average loss of the sample, according to the large number theorem, when the sample tends to infinity, the experience risk will be infinitely close to the expected risk.

2. Naive Bayesian algorithm

(1) Idea: The simplicity of naive Bayesian algorithm lies in the idea that the individual elements of the input vector (X1, X2,...., Xn) are independent of each other, so the probability P (x1=x1,x2=x2,.... XN=XN) =p (x1=x1) P (x2=x2) ... P (XN=XN), secondly, based on Bayesian theorem, for a given training data set, first, based on the characteristic condition independent hypothesis learning joint probability distribution, then based on this model, for a given input vector, using Bayesian formula to find the most posterior probability output classification label

(2) Details: To determine the type of input vector x of the calculation process to specify the naïve Bayesian computation process

<1> to calculate the class of input vector x, which is the probability of y in the condition of x, when y takes the maximum probability of a value, then the value is the classification of x, then the probability is P (y=ck| X=X)

<2> using conditional probability formula to derive Bayesian formula (this step is not necessary, I am accustomed to remember the Bayesian formula)

By the conditional probability formula get P (y=ck| x=x) = P (y=ck,x=x)/P (x=x) = P (x=x | Y=CK) P (y=ck)/P (x=x)

The full probability formula is available (replace P (x=x)):

<3> because of naive Bayes ' simplicity ', the eigenvector is independent of each other, so the following formula can be obtained:

　　　<4> bring the formula in <3> into the <2> Bayesian formula to get:

<5> the denominator of the type, for the given input vector x, and all the values of Y, all used, in detail, whether it is calculated in the vector x condition of any one of the Y value ck,k=1,2....k, vector and c1.....ck are used, so the impact P (y=ck| X=X) size only molecules work, so you get

Note: Argmax refers to CK with the largest probability of taking

<6> actually the whole process of <5> naive Bayes is complete, but the P (y=ck) and P (X (J) =x (j) | Y=CK) does not say that the two solutions are based on the maximum likelihood estimation method to the probability, that is, the following formula:

One of the I (..) is the indicator function, of course, these probabilities in the actual can be very block, you can see the following a question, after reading the two probabilities are how to beg, the formula derivation process does not repeat (I am not very clear, but as similar to the two-item distribution of maximum likelihood evaluation)

3, the question-----a look on the top of the string up (direct mapping)

[Machine learning & Data mining] naive Bayesian mathematical principles

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

[Machine learning & Data mining] naive Bayesian mathematical principles

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

[Machine learning & Data mining] naive Bayesian mathematical principles

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support