Introduction: There are two main methods in probability statistics: Parameter Statistics and non-Parameter Statistics (or parameter estimation and non-parameter estimation ). Where,

**Parameter Estimation**It is a method of probability statistics. When the sample is known, it is generally known or assumed that the sample is subject to a certain probability distribution, but does not know the specific parameters (or know the specific model, but do not know the model parameters ). Parameter Estimation is to observe the results through multiple tests and use the results to generate the approximate values of parameters. (When you launch a parameter with a very high possible value, it is equivalent to knowing the distribution and its parameters, and you can use it to estimate the probability of other samples. This is an application)

There are multiple methods for parameter estimation. Here we analyze three probability-based methods: Maximum likelihood and Bayes) and maximum a posteriori ). Let us assume that the observed variable is, the observed variable value (sample) is, the parameter to be estimated is, the distribution function of IS (we use the conditional probability to explicitly show that the distribution depends on the value ). In reality, sums can all be vectors of several variables. Here we may think that they are all scalar (if Theta is scalar, if it is vector partial ). Here, P (x | θ) can be Gaussian distribution or other distributions.

"Likelihood/likelihood" means "the likelihood of an event (that is, the observed data)". The maximum likelihood estimation is an estimate to be found, maximizing the possibility of an event, that is, to maximize. In general, we think that multiple sampling results are independently and evenly distributed (IID ).

Generally, given that N is relatively small and N is relatively large, concatenation is prone to overflow in floating-point operations. Therefore, we usually maximize the corresponding logarithm form.

For specific interpretation, you can find the derivative of the right pair, and then set it to 0. The obtained value is.

**Then**Generally, we know the specific distribution, and add the theta parameter. Then, we can use the formula containing Theta to represent the probability of each (independent) occurrence. In this way, l (theta) is a sub-formula containing the theta parameter. Finally, the derivation (or partial derivation) is obtained, and the solution (group) is OK. Note: The maximum likelihood considers the parameter to be evaluated as a deterministic amount, but its value is unknown. The best estimate is the value that maximizes the probability of producing the observed sample. This estimation is the greatest possibility, but not unbiased estimation.

**Maximum Likelihood Estimation is a point estimation,**Only one value of the parameter to be estimated can be obtained. (1) but sometimes we don't just want to know, we also want to know the distribution of the entire observed data. (2) The maximum likelihood estimation only estimates the overall distribution based on (limited) observed data. It may be inaccurate when the data volume is small. For example, if we want to estimate the average weight of a person, but the people we sample are all children, the average weight we get will not reflect the overall distribution, we should take into consideration the fact that "children account for 20% of the total population. In this case, we can use the Bayesian method.

Using the Bayes formula, we can combine our prior knowledge and observed data to determine the Posterior Probability:

Here is the accumulation factor to ensure that the sum is 1. To use the Bayes method, we need prior knowledge, that is, the probability of different values. For example, it indicates rain, not rain. Based on past experience, we generally have, when such knowledge is insufficient, we can assume that the distribution is even, that is, the probability of each value is equal.

Under a certain value, the probability of event X is that this is a function, such as a normal distribution of one dollar. As in the previous section, we think that each sampling is independent and can be written separately. In this way, we can get an expression with different values.

Based on the obtained value, we can take the value that maximizes it and mark it. Some people may have seen the problem: we have done a lot of extra work. In order to get one, we have taken other values into consideration. Of course, sometimes the distribution is useful, but sometimes we don't need to know, we only need that. The largest posterior is estimated to be on the stage at this time.

**Note:**Bayesian Estimation considers the parameters to be estimated as random variables that conform to a certain prior probability distribution. The process of observing a sample is to convert the prior probability density to the posterior probability density, so that the initial estimation value of the parameter is corrected using the sample information. In Bayesian estimation, a typical effect is that each new observed sample makes the posterior probability density function more acute, make it form the largest peak near the real value of the parameter to be evaluated.

- Max Posterior Estimation Map

The largest Posterior Estimation uses the Bayesian Estimation idea, but it is directly obtained instead of solving it. It can be seen from the Bayesian estimation formula that it is irrelevant and requires the most, which is equivalent to solving the following formula:

Like in the maximum likelihood estimation, we usually maximize the corresponding logarithm form:

In this way, we can get what we want without having to calculate or obtain the specific part.

Similar to the maximum likelihood, It is also assumed that the parameter is unknown, but it is a definite value. Only the optimization function is in the form of posterior probability, and a prior probability item is added.

Differences and summary:

**Differences:**The biggest difference between Bayesian estimation and the two is that it is assumed that the parameter is also a random variable, not a definite value. In sample distribution D, all possible conditions of the parameter are calculated, and the probability density of the class condition is calculated based on the parameter expectation. That is to say, Bayesian estimation is not the parameter value for finding the maximum probability of the observed sample in the way of maximum likelihood estimation. Instead, find all possible parameter values and their corresponding trusted values (for the moment, what should we call it ). In this way, you can know the credibility of different parameter values. For example, ① if there are three values: 0.8, 0.05, and 0.05, then the maximum value is 0.8. For example, ② the trusted values of the three parameter values are 0.4, 0.39, and 0.39. Therefore, you must be careful when selecting the first parameter value.

However, when the parameter distribution is peak (for example, ①) and the corresponding sample distribution is relatively flat, the maximum likelihood is similar to Bayesian.

**Summary:**

The three methods have their own merits and are used in different scenarios. When you have no confidence in the estimation of the anterior probability, you can use the maximum likelihood estimation (of course you can also use the other two ). Bayesian Estimation obtains the posterior probability distribution. The maximum likelihood estimation is applicable to the one that only needs to know the maximum posterior probability. In general, the maximum likelihood calculation is simple, while in some special cases, Bayes is better than the maximum likelihood.

On the other hand, we can feel that,**The maximum likelihood estimation is very different from Bayes/map because the last two estimation methods use prior knowledge. If they are used properly, better results can be obtained. In fact, this is also a difference between the two factions (frequentists and bayesians.**

A comparison between parameter estimation and non-parameter estimation is provided:

References: http://guangchun.wordpress.com/2011/10/13/ml-bayes-map/

Image version: because the damn blog does not support latex formula editing, the formula cannot be displayed. below is the image version.

Parameter Estimation: Maximum Likelihood Estimation, Bayesian estimation, and Maximum Posterior Estimation