Bayesian Introduction Bayesian learning Method characteristic Bayes rule maximum hypothesis example basic probability formula table
Machine learning learning speed is not fast enough, but hope to learn more down-to-earth. After all, although it is it but more biased in mathematics, so to learn the rigorous and thorough, can be better applied to the right scene. Bayesian profile
Bayesian inference provides a probabilistic means of inference. It is based on the following jiading, the amount to be examined follows a certain probability score, and can be 2 based on these probabilities and observed data to make the optimal decision.
Bayesian learning is related to machine learning for the following two reasons:
1. Bayesian Learning algorithms can compute explicit hypothetical probabilities, such as naive Bayesian classifier, which is one of the most practical ways to solve the problem of response learning. For text document categorization (Electronic news Class). Naive Bayesian classifier is one of the most effective classifications for such learning tasks.
2. Bayesian learning provides an effective means of understanding most learning algorithms, and these algorithms do not necessarily manipulate probabilistic data directly. The characteristics of Bayesian learning methods the moral of each training sample can increment the reduction or increase the estimated probability of a given hypothesis. This provides a more reasonable way to learn than other algorithms. Other algorithms completely remove the assumption if it is inconsistent with any of the same examples. Prior knowledge can observe the data together to determine the final probability of the hypothesis. In Bayesian learning, the form of prior knowledge can be: (1) a priori probability of each candidate hypothesis; (2) The probability distribution of each possible hypothesis on observable data. The Bayesian approach allows assumptions to be made for uncertain predictions (such as the assumption that this pneumonia patient has a 93% chance of recovery) The new instance classification can be predicted together by multiple hypotheses, weighted by their probabilities. Even when the Bayesian method is computationally complex, they can still measure other algorithms as a criterion for optimal decision making. Bayes Law
In machine learning, we are usually interested in determining the best assumptions in the hypothetical space H given the training data d. The so-called best hypothesis, one approach is to define it as a priori probability in the given dataset D and h of the different assumptions of the knowledge offending possible (most probable) hypothesis. Bayesian theory provides a way to directly compute this possibility. More precisely, the Bayes rule provides a method for calculating hypothetical probabilities, based on the assumption that the probability of different data and the observed data itself are observed under the given assumption of a priori probability.
The above long-winded talk so much, is to give a concept, the following is a priori probability, the posterior probability with P (h) to represent in the absence of training data before assuming H have the initial probability. P (h) is referred to as the prior probability of H (prior probability). Use P (x|y) to represent the probability of x given Y. That is, given the training data d, the probability of the establishment of H. P (h| D) a posteriori probability called H (posterior probability). Posterior probability P (h| D) reflects the effect of training data d, whereas the priori probability P (h) is independent of D.
The Bayes rule is the basis of Bayesian learning methods because it provides a posteriori probability p (h|) calculated from a priori probability P (h) and P (D) and P (d|h). D) method.
Bayesian formula
P (h| d) =p (D|H) P (h) p (d)
As you can see here, the larger p (D), p (h| d The smaller, in fact, is reasonable, that is, if the single d of the higher the priori probability, that is, the probability of individual observation when the greater, then D to H support is smaller. Great assumption
In many learning scenarios, when a learner considers a candidate hypothesis set H and searches for a given data d, the most likely assumption is h∈h (or one of several if there are multiple hypotheses). Such an assumption with the greatest likelihood is called the maximum posteriori (maximum a posteriori, MAP) hypothesis. The method of determining the map assumption is to calculate the posterior probability of each candidate hypothesis with the Bayesian formula. Hmap is the map hypothesis.
HMAP≡ARGMAXH∈HP (h| d) =argmaxh∈hp (D|H) P (h) p (d) =argmaxh∈hp (D|H) p (h)
Here you can see that the P (D) is removed in the last step because it is not a constant dependent on H.
In some cases, we can assume that each assumption in H has the same prior probability, that is, for any hi and HJ, P (HI) =p (HJ). This can further simplify the formula, just consider the P (d|h) can be found to increase the possible assumption.
P (D|H) is called the likelihood degree (likelihood) of the data D when given H. Instead, the largest assumption of P (d|h) is called maximum likelihood (maximum likelihood, ML) hypothesis HML
HML≡ARGMAXH∈HP (D|H)
The data d is the training sample of the objective function, H called the function space of the candidate target. Example
From: Machine learning
If in the society, a person cancer probability is: 0.8%, and to a person to detect, detected Cancner accurate rate is: 98%, detect a person is not cancner accuracy is: 97%. Then if a person to test, the result is ⊙ whether the person should be judged to have cancer (that is, Judge P (Cancner|⊕)):
First, according to known conditions, we know:
P (Cancner) p (⇁cancner) p (⊕|cancner) p (⊕|⇁cancer) p (⊖|cancner) p (⊖|⇁cancer) =0.8%=99.2%=98%=3%=2%=97%
So according to the formula above, you can get:
P (⊕|cancer) p (caner) p (⊕|⇁cancner) p (⇁cancer) = (0.8%∗99.2%) ≈0.78%= (3%∗99.2%) ≈2.98%
So hmap=⇁cancer. The exact posterior probability can be normalized to the result of their 1.
P (cancer|⊕) =0.78%0.78%+2.98%=0.21
The basis of this step is that the Bayesian formula shows that the posteriori probability is the amount divided by P (⊙). Although not given this variable as always, but because
P (cancner|⊕) ANDP (⇁cancner|⊕)
The two variables are bound to be 1. can be normalized. So although the posterior probability of cancer is greater than a priori probability, it is still possible to assume that the patient has no cancer.
Note that the above calculation of the prior probability of cancer detection P (⊙) is based on the probability of calculation: the law of full probability will be said below. Basic Probability Formula Table multiplication rules (Product rule): the intersection probability of a and B is P (A∧B):
P (A∧B) =p (a| b) P (b) =p (b| A) p (a) addition rules (Sum rule): The probability of two things A and B is P (a∨b):
P (a∨b) =p (A) +p (B) +p (A∧B) Bayes rule (Bayes theorem): The posterior probability for a given D H is P (h| D):
P (h| d) =p (D|H) P (h) p (d) Full probability law (theorem of total probability) if time A1,..., an mutex, and ∑ni=1p (Ai) =1:
P (B) =∑I=1NP (b| AI) P (AI)
These are some of the basic theories of Bayesian, as the basic learning behind the Brute-force Bayesian concept and naive Bayes classifier, etc. are very important. I will continue to study hard in the back.