Parameter estimation method of text language model--maximum likelihood estimation, MAP, Bayesian estimation

Last Update:2018-07-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Transferred from: http://blog.csdn.net/woshizhouxiang/article/details/17556241

The emphasis is on mastering the method of model parameter estimation, realizing the optimization of the model.

The text language model, which is represented by pLSA and LDA, is a hot topic in the research of statistical natural language processing today. This kind of language model usually presents its own probabilistic graph model for the text generation process, then estimates the model parameters by using the observed corpus data. With the language model and the corresponding model parameters, we can have many important applications, such as text feature dimensionality reduction, text topic analysis and so on. This paper mainly introduces three kinds of parameter estimation methods of text analysis-maximum likelihood estimation mle, maximum posteriori probability estimation map and Bayesian estimation.

1. Maximum Likelihood estimation MLE

First, look back at the Bayesian formula.

This formula, also called inverse probability formula, can transform the posterior probability into a computational expression based on likelihood function and prior probability, i.e.

The maximum likelihood estimation is to use the likelihood function to take the maximum value as the parameter value as the estimate, the likelihood function can write

Because of the multiplication operation, it is usually simple to calculate the logarithm of the likelihood function, that is, the logarithmic likelihood function. The maximum likelihood estimation problem can be written

This is a function of solving this optimization problem usually to the derivation, to get the derivative of the extreme point of 0. The function gets the maximum value is the corresponding value is the model parameter that we estimate.

With the last of the coin toss experiment as an example, the results of n experiments obey two distributions, the parameter is P, that is, the probability of each experiment event, it may be set as the probability of getting positive. To estimate p, the likelihood function can be written using the maximum likelihood estimation.

This indicates the number of times the experimental result is I. The extremum point of the likelihood function is

The maximum likelihood estimate of the parameter p is

It can be seen that the probability p of each event in two distributions is equal to the probability of the event occurring in N independent repetitive random trials.

If we do 20 experiments, there are 12 positive, negative 8

Then the parameter value p is 12/20 = 0.6 according to the maximum likelihood estimation.

2. Maximum posteriori estimate map

The maximum posteriori estimate is similar to the maximum likelihood estimate, and the difference is that it is permissible to include a priori in the estimated function, that is, the maximum likelihood function is not required at this time, but the whole posteriori probability calculated by the Bayesian formula is the largest.

Note that P (X) here is independent of the parameter, so it is equivalent to making the numerator maximum. Compared with the maximum likelihood estimate, it is now necessary to add a logarithm of the probability of a priori distribution. In practical applications, this priori can be used to describe the universal laws that people already know or accept. For example, in a coin toss test, the probability of each throw positive occurrence should be subject to a probability distribution, the probability of the maximum value at 0.5, the distribution is a priori distribution. The parameters of the prior distribution are called Hyper-parameters (hyperparameter), i.e.

In the same way, when the above-mentioned posteriori probability gets the maximum value, we get the parameter value according to the map estimate. Given the observed sample data, the probability of a new value occurring is

Here we still show the coin toss example, we expect the prior probability distribution at 0.5 to get the maximum value, we can choose the beta distribution is

Where Beta function expansion is

When x is a positive integer

The random variable range of the beta distribution is [0,1], so you can generate normalised probability values. The following figure shows the probability density function of the beta distribution in the case of different parameters.

We take, so that a priori distribution takes maximum value at 0.5, and now we are going to solve the extremum point of the map estimation function, and the same p derivative number we have

The maximum posteriori estimate for the parameter p is

Compared with the results of the maximum likelihood estimation, it can be found that there are more such pseudo-counts in the results, which is the transcendental function. And the larger the parameter, the more observations are needed to change the belief of the prior distribution, at which point the corresponding beta function gathers and tightens at its maximum value.

If we do 20 experiments, 12 times in front and 8 times on the back, then

Then the parameter p estimated by map is 16/28 = 0.571, less than the maximum likelihood estimate of the value of 0.6, which also shows that "the coin is generally two-sided uniform" this prior to the parameter estimation effect.

3 Bayesian estimates

Bayesian estimation is further expanded on the map without directly estimating the value of the parameter, but allowing the parameter to obey a certain probability distribution. Look back at the Bayesian formula

It is not required that the posterior probability is the largest, which requires that the probability of the observed evidence be expanded by the full probability formula.

When new data is observed, the posteriori probability can be adjusted automatically. But the general probability of this method is that Bayesian estimation is a tricky place to find.

So how to use Bayesian estimation to make predictions. If we want to find the probability of a new value, it can be

To calculate. Note that the integral of the second factor is no longer equal to 1, which is a very different point from the MLE and the map.

We still use the last of coin toss as an example to illustrate. As in map, we assume that the prior distribution is a beta distribution, but when constructing Bayesian estimates, it is not required to approximate the parameter values with the most posterior parameters, but rather to satisfy the expectations of the beta distribution parameters p, there are

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Parameter estimation method of text language model--maximum likelihood estimation, MAP, Bayesian estimation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Parameter estimation method of text language model--maximum likelihood estimation, MAP, Bayesian estimation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support