Application of maximum likelihood estimation (MLE) and maximum posteriori probability (MAP) in machine learning

Source: Internet
Author: User

Maximum likelihood estimation MLE

Given a bunch of data, if we know that it was randomly taken out of a certain distribution, we don't know the specific parameter of the distribution, i.e. "the model is determined and the parameters are unknown".

For example, for linear regression, we assume that the sample is subject to a normal distribution, but we do not know the mean and variance, or for logistic regression, we assume that the sample is subject to two distributions, but we do not know the mean, and the logistic regression formula gets the probability of the dependent variable y p = g (x), x is the argument, A probability value is obtained by means of a logical function, y corresponds to a discrete value of 0 or 1,y obeys a two-item distribution, the error term is subject to two distributions, rather than a Gaussian distribution, so it is not possible to use the least squares model parameter estimation, and the maximum likelihood estimation can be used to estimate the parameters; Maximum likelihood estimation) can be used to estimate the parameters of the model. The goal of MLE is to find a set of parameters that make the model produce the highest probability of observing data:

This is the likelihood function, which indicates the probability of the observed data appearing under the parameter. We assume that each observation data is independent, then there are

In order to take the derivative convenient, generally to the target log. Therefore, the optimization of the likelihood function is equivalent to the optimal logarithmic likelihood function:

Give a simple example of tossing a coin. Now there is a positive and negative is not very symmetrical coin, if the face up to H, the aspect is marked as T, throw 10 times the result is as follows:

How likely is this coin to face up?

It's obvious that the probability is 0.2. Now we use the idea of MLE to solve it. We know that every coin toss is a two-item distribution, with the probability of facing up, then the likelihood function is:

X=1 said face up, x=0 the side upward. Then there are:

Derivation:

With a derivative of 0, it is easy to get:

That's 0.2.

Maximum posteriori probability MAP

The above MLE is to find a set of parameters that can make the likelihood function maximum, namely. Now the problem is a little bit more complicated, if this parameter has a priori probability? For example, in the case of tossing coins above, if our experience tells us that coins are generally symmetrical, that is, the probability of =0.5 is the most, the probability of =0.2 is relatively small, then how to estimate the parameters? This is the problem that map needs to consider. Map optimization is a posteriori probability, that is, given the observed value of the maximum probability:

The upper formula expands according to the Bayesian equation:

We can see that the first term is the likelihood function, and the second is the prior knowledge of the parameter. After taking the log is:

Back to the example of the coin toss, suppose that the parameter has a priori estimate, which obeys the beta distribution, namely:

And each coin toss is subject to two distributions:

The derivative of the objective function, then, is:

The first item of derivation has been given in the above Mle, and the second is:

The derivative is 0 and the solution is:

Which indicates the number of heads facing up. As you can see here, the difference between MLE and map is that the result of the map is more than a priori distributed parameter.

Supplemental Knowledge: Beta distribution

Beat distribution is a common prior distribution, its shape is controlled by two parameters, the domain is defined as [0,1]

The maximum value of the beta distribution is when x equals:

So in a coin toss, if the prior knowledge is that the coin is symmetrical, then let. But it is clear that even if they are equal, the value of both of them has an effect on the final result. The greater the value of the two, the less likely it is to deviate from symmetry:

Application of maximum likelihood estimation (MLE) and maximum posteriori probability (MAP) in machine learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.