Maximum likelihood estimation (MLE) and the application of maximum posterior probability (MAP) in machine learning

Maximum likelihood estimation (MLE) and the application of maximum posterior probability (MAP) in machine learning _ machine learning

Last Update:2018-08-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Maximum likelihood estimation MLE

Given a bunch of data, if we know it is randomly taken out of a distribution, we do not know the specific parameters of the distribution, that is, "the model has been set, the parameter is unknown."

For example, for linear regression, we assume that the sample is subject to a normal distribution, but does not know the mean and variance, or for logistic regression, we assume that the sample is subject to two-item distribution, but does not know the mean, the logical regression formula gets the probability p = g (x) of the variable y, and X is the Independent To get a probability value through a logical function, y corresponding to the discrete value of 0 or 1,y two distribution, the error is subject to two distribution, rather than the Gaussian distribution, so the least squares can not be used to estimate the model parameters, the maximum likelihood estimation is used to estimate the parameters; Maximum likelihood estimation) can be used to estimate the parameters of the model. The goal of MLE is to find a set of parameters that allow the model to produce the maximum probability of observational data:

One is the likelihood function, which indicates the probability of observing the data under the parameter. We assume that each observation data is independent, so there are

In order to take the derivation conveniently, the target log is generally taken. So the optimization of the likelihood function is equivalent to the optimal logarithmic likelihood function:

Give a simple example of a coin toss. Now there is a coin that is not very symmetrical, if the front is marked H, the aspect is recorded as T, the result is 10 times as follows:

Ask for the probability of the coin facing up.

Obviously the probability is 0.2. Now we're going to solve it with MLE's mind. We know that every coin toss is a two-item distribution, and the probability of a front facing up is, then the likelihood function is:

X=1 said face up, X=0 said side upward. Then there are:

Derivation:

So that the derivative is 0, it's easy to get:

which is 0.2.

maximum posterior probability MAP

The above MLE is to find a set of parameters that can make the likelihood function the largest, that is. Now the problem is a little bit more complicated, if this parameter has a priori probability. For example, in the case of flip coins above, if our experience tells us, coins are generally symmetrical, that is, the =0.5 is the most likely, the possibility of =0.2 is relatively small, then how to estimate the parameters? This is the problem that map should consider. Map optimization is a posteriori probability, that is, given the observed value of the maximum probability:

The upper formula is expanded according to the Bayesian formula:

We can see that the first one is the likelihood function, and the second is the priori knowledge of the parameters. After the log is taken:

Back to the flip-coin example, suppose the parameter has a priori estimate that obeys the beta distribution, namely:

And each coin toss is subject to two distribution:

Then the derivative of the objective function is:

The first item of derivation has been given in the mle above, and the second is:

So that the derivative is 0, the solution is:

which represents the number of faces facing up. In this view, the difference between MLE and map is that the results of the map are more than some prior distribution parameters.

Supplemental Knowledge: Beta distribution

Beat distribution is a common prior distribution, its shape is controlled by two parameters, and the domain is defined as [0,1]

The maximum value for the beta distribution is x equals:

So in a coin toss, if a priori knowledge is that the coin is symmetrical, then let it be. But it is clear that even if they are equal, the value of both will have an effect on the end result. The greater the value of two, the less likely it is to deviate from symmetry:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Maximum likelihood estimation (MLE) and the application of maximum posterior probability (MAP) in machine learning _ machine learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Maximum likelihood estimation (MLE) and the application of maximum posterior probability (MAP) in machine learning _ machine learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support