**Maximum likelihood estimate:**

The maximum likelihood estimation provides a method for evaluating model parameters with a given observation data, namely: "The model is determined, the parameters are unknown". To put it simply, suppose that we want to count the height of the population of the country, first of all assuming this height obeys the normal distribution, but the mean and variance of the distribution are unknown. We don't have the manpower and the material to count the height of each person in the country, but we can get the height of some people by sampling, and then obtain the mean and variance of the normal distribution in the hypothesis by maximum likelihood estimation.

The sampling in the maximum likelihood estimation needs to satisfy an important hypothesis that all the samples are distributed independently. Let's describe in detail the maximum likelihood estimate:

First, assuming that the sample is independent of the same distribution,θ is the model parameter, andF is the model we use, following our independent assumption of the same distribution. A model F with a parameter of θ produces the above sampling to be expressed as

Back to the "model has been determined, parameters unknown," the statement, at this time, we know that, the unknown is θ, so the likelihood is defined as:

In the actual application is commonly used in both sides to take the logarithm, the formula is as follows:

This is called logarithmic likelihood, which is called the mean logarithmic likelihood. And what we call the maximum likelihood is the largest logarithmic average likelihood, namely:

1-p . Since each shot comes out, after recording the color, we put the extracted ball back into the jar and shake it evenly, so the color of each drawn ball is subject to the same independent distribution. Here we refer to the color of the ball drawn once as a sample. Of the 100 samples, 70 were the probability of p (Data | m), here is all data, and M is the model given, indicating that each time the ball drawn out is white with a probability of p . If the result of the first sample is recorded as x1 , the result of the second sample is recorded as data = (x1,x2,..., x100). this way,

P (Data | M

= P (x1,x2,..., x100| M

= P (x1| M) P (x2| M) ... P (x100| M

= P^70 (1-p) ^30.

So when p is taking what value,p (Data | M) is the maximum value? The p^70 (1-p) ^30 is derivative of p and is equal to zero.

70p^69 (1-p) ^30-p^70*30 (1-p) ^29=0.

The solution equation can be p=0.7.

At the boundary point p=0,1,p (data| M) = 0. so when p=0.7 ,p (data| M) is the maximum value. This is the same as the result of our common sense as measured by the proportion in the sample.

If we have a set of sample values for continuous variables (x1,x2,..., xn), we know that this set of data obeys the normal distribution and the standard deviation is known. What is the expected probability of this normal distribution, which is the most likely to produce this existing data?

P (Data | M) =?

According to the formula

You can get:

For μ derivation, the maximum likelihood estimate results in μ= (X1+X2+...+XN)/n

The general solution process for maximum likelihood estimation is as follows:

(1) Write out the likelihood function;

(2) The likelihood function takes the logarithm, and organizes;

(3) Derivative number;

(4) Solution likelihood equation

Note: The maximum likelihood estimate only considers the probability that a model can produce a given observation sequence. Without considering the probability of the model itself. This differs from Bayesian estimation. The Bayesian estimation method will be described at a later blog post

This article references

Http://en.wikipedia.org/wiki/Maximum_likelihood

Maximum posteriori probability:

The maximum posteriori estimate is a point estimate of the hard-to-observe quantity based on empirical data. Similar to the maximum likelihood estimate, but the greatest difference is that the maximum posteriori estimate incorporates a priori distribution of the estimated amount. Therefore, the maximum posteriori estimate can be regarded as the maximum likelihood estimate of the rule.

First, we review the maximum likelihood estimates in the previous article, assuming that X is an independent sample of the same distribution, θ is the model parameter, and F is the model we are using. Then the maximum likelihood estimate can be expressed as:

Now, suppose that the prior distribution of θ is g. With Bayesian theory, the posterior distribution of θ is shown in the following formula:

The goal of the final distribution is:

Note: The maximum posteriori estimate can be considered as a specific form of Bayesian estimation.

For example:

Suppose there are five bags, each with an unlimited amount of biscuits (cherry or lemon), and the ratio of the two flavors known to five bags is

Cherry 100%

Cherry 75% + Lemon 25%

Cherry 50% + Lemon 50%

Cherry 25% + Lemon 75%

Lemon 100%

If only the above conditions, the question from the same bag to get 2 of lemon biscuits, then this bag is most likely the above five which one?

We first use the maximum likelihood estimation to solve this problem and write out the likelihood function. Assuming that the probability of the lemon cookie being taken out of the bag is P (which we use to determine which bag is taken from), the likelihood function can be written

Since the value of P is a discrete value, the 0,25%,50%,75%,1 described above. We just need to evaluate which value of these five values makes the likelihood function maximum and get the bag 5. Here is the result of the maximum likelihood estimate.

One problem with the above-mentioned maximum likelihood estimation is that the probability distribution of the model itself is not taken into account, and the problem of this cookie is extended below.

Suppose the probability of getting a bag 1 or 5 is 0.1, the probability of getting 2 or 4 is 0.2, the probability of getting 3 is 0.4, and the same answer to the above question? This is the time to change the map. We are based on the formula

Write our map function.

According to the description of test instructions, the values of P are 0,25%,50%,75%,1,g respectively 0.1,0.2,0.4,0.2,0.1. The results of the map function are as follows: 0,0.0125,0.125,0.28125,0.1. The result from the map estimate is the highest obtained from the fourth bag.

All of these are discrete variables, so what about continuous variables? When the hypothesis is independent of the same distribution, μ has a priori probability distribution of. Then we want to find the maximum posteriori probability of μ. According to the previous description, write the map function as:

At this point we take the logarithm on both sides. The maximum value of the above equation can be equal to the request

The minimum value. The derivative can be obtained by the μ as

The above is the process of map solving for continuous variables.

What we should note in the map is:

The biggest difference between map and MLE is that the probability distribution of the model parameter itself is added to the map, or. In Mle, the probability of the model parameter itself is uniform, that is, the probability is a fixed value.

From for notes (Wiz)

Maximum likelihood estimation (MLE) and maximum posteriori probability (MAP)