Original URL:
http://m.blog.csdn.net/article/details?id=49130173
first, transcendental probability, posterior probability, Bayesian formula, likelihood function
In machine learning, these concepts are always involved, but never really understand the connection between them. Here's a good idea to start with the basics, Memo. 1. Prior probability
A priori probability relies only on subjective empirical estimation, that is to say beforehand, according to existing knowledge, a priori probability is the probability that there is no experimental verification, based on known subjective conjecture.
If you toss a coin, the subjective inference P (face up) = 0.5 before throwing.
2. Post-Test probability
The posterior probability refers to the probability of re-correcting after obtaining the "result" information, as in the Bayesian formula. is the "fruit" in the problem of "fruit seeking". The transcendental probability is inseparable from the posterior probability, and the posteriori probability is based on a priori probability. The explanation is that, with the known fruit (B), the probability P (A) of the re-correction is obtained (a| B), a posteriori probability, also known as a conditional probability. The posteriori probability can be solved by Bayesian formula. 3. Bayesian Formula
A Bayesian formula used to describe the relationship between two conditional probabilities (posterior probabilities), such as P (a| B) and P (b| A). According to the multiplication rule:
P (A∩B) = P (A) *p (b| A) =p (B) *p (a| B
The above formula can also be deformed as:
P (a| B) =p (A) P (b| A)/P (b) P (b) is a standardized constant
The Bayes rule is expressed as follows:
General formula
which
A1,,,,,, An is a complete event group, i.e.
To give a simple example: a pocket has 3 red ball, 2 white ball, use does not put back way to touch, Beg:
⑴ First touch red ball (record as a) probability;
⑵ second time to touch the red ball (recorded as B) probability;
⑶ known the second time to touch the red ball, to find the first touch is the probability of a red ball.
Solution:
⑴p (a) =3/5, which is the prior probability of A;
⑵p (B) =p (b| A) P (a) +p (b| A inverse) P (a inverse) =3/5 this is called the quasi-constant, a and a inverse is called the complete event Group
⑶p (a| B) =p (A) P (b| A)/P (B) =1/2, which is the posteriori probability of a. 4. Likelihood function 1) Concept
In mathematical statistics, the likelihood function is a function of the parameters in the statistical model, which indicates the likelihood of the model parameters.
The likelihood function plays an important role in statistical inference, such as the application of maximum likelihood estimation and fisher information, etc. "Likelihood" and "probability" or "probability" mean similar, all refers to the possibility of an event, but in statistics, "likelihood" and "probability" or "probability" have a clear distinction.
Probabilities are used to predict the results of subsequent observations in the case of some known parameters, and
Likelihood is used to estimate the parameters of the nature of the object when the results obtained from certain observations are known.
Examples are as follows:
We can ask the "probability" of "a coin that has been thrown 10 times on a positive and negative coin" to 10 times when it landed,
and for the event that "a coin is thrown 10 times and landed positively", we can ask, What is the "likelihood" level of the coin's positive and negative face (i.e. the probability of a positive and negative probability of 0.5)?
2) Definition
Given output x, the likelihood function L (θ|x) of the parameter θ is equal to the probability of the variable x=x after the given parameter θ:
L (θ|x) =p (x=x|θ).
The formula is interpreted as follows: The likelihood function of the parameter θ is evaluated (in numerical terms) equal to the conditional probability of the observed result x under the given parameter θ, which is the posterior probability of x. The larger the value of the general likelihood function indicates that this parameter θ is more reasonable under the result x=x.
As a result, the likelihood function is also a conditional probability function, but the variables we pay attention to are changed, and the value of a is the likelihood value of the parameter θ:
θ<---> P (B | A =θ)
So say Bayesian formula P (a| B) =p (b| A) P (a)/P (B) may also be expressed in the form of:
Posterior probability of a = (priori probability of a likelihood * a)/normalized constant
In other words, the posterior probability is proportional to the product of the prior probability and likelihood degree.
Note that there is no requirement for the likelihood function to satisfy the normalization: ∑p (B | A =θ) = 1
A likelihood function is still a likelihood function when multiplied by a positive constant. For all α> 0, you can have a likelihood function:
L (θ|x) =αp (x=x|θ).
3) Example
For example, consider the experiment of throwing a coin. In general, the probability that a given coin has a positive and negative face up to each other is ph= 0.5, and it is possible to know the various results after throwing several times. For example, the probability of a two-shot is 0.25. In terms of conditional probabilities, that is:
P (HH | pH = 0.5) = 0.5^2 = 0.25
Where h means face up.
In statistics, we are concerned with information about the possibility of a coin throwing in the face when the result of a series of throws is known. We can set up a statistical model: assuming that the coin will have a ph probability on the face up, and the probability of 1−ph the opposite side upward. At this point, the conditional probability can be rewritten as a likelihood function:
L (PH = 0.5 | HH) = P (hh | pH = 0.5) = 0.25
That is, for the likelihood function to be determined, the likelihood of ph= 0.5 (probability) is 0.25 when the observed two throws are facing upwards (this does not mean that the probability of ph= 0.5 is 0.25 when the two times the face-up is observed).
If you consider ph= 0.6, then the value of the likelihood function will also change.
L (PH = 0.6 | HH) = P (hh | pH = 0.6) = 0.36
Notice that the value of the likelihood function becomes larger. This shows that if the ph value of the parameter is changed to 0.6, the result is that it is more likely to be observed twice in a row than assuming ph= 0.5. In other words, the ph of the parameter is 0.6 more convincing and "reasonable" than taking 0.5. In short, the importance of the likelihood function is not its specific value, but whether the function becomes smaller or larger when the parameter changes. For the same likelihood function, if there is a parameter value, so that its function value reaches the maximum, then this value is the most "reasonable" parameter value.
In this example, the likelihood function is actually equal to:
L (PH =θ| HH) = P (hh | PH =θ) = θ^2
If ph= 1 is taken, then the likelihood function reaches the maximum value of 1. That is to say, when the continuous observation of two times the face upward, it is assumed that the coin is thrown on the face upward probability of 1 is the most reasonable.
Similarly, if the observed three throws, the first two times the front face up, the third opposite side upward, then the likelihood function will be:
L (PH =θ| HHT) = P (HHT | ph =θ) = θ^2 (1-θ), where t means reverse facing up, 0 <= PH <= 1
At this point, the likelihood function's maximum value will be taken at ph = 2/3. That is to say, when the first two sides of the three throws are observed facing up and then facing up, it is reasonable to estimate the probability of a coin throwing with a positive upward direction of ph = 2/3.