The maximum posterior estimation is an estimation of points that are difficult to observe based on empirical data. Similar to the maximum likelihood estimation, but the maximum difference is the same. The Maximum Posterior Estimation incorporates the Prior Distribution of the expected estimator. Therefore, the maximum posterior estimation can be seen as the normalized maximum likelihood estimation.
First, let's review the maximum likelihood estimation in the previous article. Assume that x is an independent sampling with the same distribution, θ is the model parameter, and f is the model we use. The maximum likelihood estimation can be expressed:
Assume that the prior distribution of θ is g. Based on bayesian theory, the posterior distribution of θ is shown as follows:
The purpose of the final verification distribution is:
Note: The maximum posterior estimation can be seen as a specific form of Bayesian estimation.
For example:
Assume there are five bags, each containing an unlimited number of cookies (cherry or lemon flavor). The ratio of the two flavors in five bags is known to be
Cherry 100%
Cherry 75% + lemon 25%
Cherry 50% + lemon 50%
Cherry 25% + lemon 75%
Lemon 100%
If only the preceding conditions are met, ask the user to get two Lemon cookies from the same bag consecutively. Which of the above five is the most likely?
We first use the maximum likelihood estimation to solve this problem and write the likelihood function. Assume that the probability of taking out the Lemon cookies from the bag is p (we use this probability p to determine which bag the cookies are from), then the likelihood function can be written.
Because the value of p is a discrete value, that is, the values 0, 50%, 75%, and 1 described above. We only need to evaluate the value of these five values to maximize the likelihood function and get the bag 5. Here is the result of the maximum likelihood estimation.
There is a problem with the above maximum likelihood estimation, that is, the probability distribution of the model itself is not taken into account. Next we will expand this cookie.
Suppose the probability of getting a bag 1 or 5 is 0.1, the probability of getting 2 or 4 is 0.2, and the probability of getting 3 is 0.4. What is the same answer to the above question? In this case, MAP is changed. Based on the formula
Write out our MAP function.
According to the description of the question, the values of p are 50%, 75%, 0.1, 1, and g, respectively, 0.2, 0.4, 0.2, and 0. 1. the MAP functions are calculated as follows: 0, 0.0125, 0.125, 0.28125, 0. 1. from the above we can see that the result obtained by MAP is the highest from the fourth bag.
The above are discrete variables. What about continuous variables? Suppose it is an independent distribution, and μ has a prior probability distribution. Then we want to find the maximum posterior probability of μ. According to the preceding description, write the MAP function as follows:
In this case, we can obtain the logarithm on both sides. The maximum value of the above formula can be equivalent
. Evaluate the obtained μ is
The above is the process of solving the MAP of continuous variables.
In MAP, we should note that:
The biggest difference between MAP and MLE is that MAP adds the probability distribution of model parameters, or. MLE considers the probability of the model parameter itself to be even, that is, this probability is a fixed value.