EM Algorithm Learning notes

Last Update:2016-08-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently contacted with the PLSA model, it is necessary to use the desired maximization (expectation maximization) algorithm to solve the problem because the subject is introduced as an implicit variable in the model.

Why an EM algorithm is required

The basic problem of mathematical statistics is to make statistical inference to the distribution of the whole or the digital characteristics of the distribution according to the information provided by the sample. The so-called General is a random variable with a definite distribution, and each IID sample from the population is a random variable with the same distribution as the population.

Parameter estimation refers to a problem in which the overall type of distribution is known, but some parameters are unknown: set $Y _1,..., y_n$ is a $iid $ sample from the overall $\TEXTBF y$, recorded $Y = (y_1,..., y_n) ^\top$ is the sample observation value if the random variable $ Y_1,..., y_n$ is observable , the parameter $\theta$ can be estimated directly by using the maximum likelihood estimation method.

However, if it contains non-observable hidden variables , the use of MLE is not so easy. The EM algorithm is a service to solve the parameter estimation problem with implicit variables.

Derivation and flow of EM algorithm

The parameter estimation problem with implicit variable $\TEXTBF z$ is considered below. The observed data is recorded as $Y = (y_1,..., y_n) ^\top$, the non-observable data is recorded as $Z = (z_1,..., z_n) ^\top$, then the likelihood function of the observed data is $ $P (Y|\theta) =\prod_{j=1}^ NP (Y_j|\theta) =\prod_{j=1}^n\sum_z[p (Z|\theta) P (y_j| Z,\theta)]$$

Where the sum number represents the sum of all possible values for the $Z $.

For the sake of convenience, it is expressed in this form: $ $P (Y|\theta) =\sum_zp (Z|\theta) P (y| Z,\theta) $$

Logarithmic likelihood: $ $L (\theta) =\ln P (Y|\theta) =\ln \sum_zp (Z|\theta) P (y| Z,\theta) $$

The EM algorithm is an iterative algorithm, and the maximal value of the objective function is obtained by iterative method. Therefore, it is expected that the value of the target function after each iteration will be greater than the value after the end of the previous iteration. Set the $n $ iteration after the parameter value is $\theta_n$, we aim to make $L (\theta_{n+1}) >l (\theta_n) $. So consider:

$ $L (\theta)-L (\theta_n) =\ln P (Y|\theta) =\ln \sum_z[p (Z|\theta) p (y| Z,\theta)]-\ln P (y|\theta_n) $$

Using Jensen Inequalities:

$$\ln\sum_j\lambda_jy_j\geq \sum_j\lambda_j\log Y_j,\quad \lambda_j\ge 0,\sum_j\lambda_j=1$$

Because $\sum_zp (z| Y,\theta_n) =1$, so the first item of the $L (\theta)-L (\theta_n) $ has

$$\begin{aligned} \ln (\sum_zp (Z|\theta) P (y| Z,\theta)) &=\ln (\sum_zp (z| Y,\theta_n) \frac{p (Z|\theta) P (y| Z,\theta)}{p (z| Y,\theta_n)}) \\&\ge \sum_zp (z| Y,\theta_n) \frac{p (Z|\theta) P (y| Z,\theta)}{p (z| Y,\theta_n)}\end{aligned}$$

The second item has

$$-\ln P (y|\theta_n) =-\sum_z[p (z| Y,\theta_n) \ln P (y|\theta_n)] $$

The lower bound of the $L (\theta)-L (\theta_n) $ is

$$\begin{aligned} l (\theta)-L (\theta_n) &\ge\sum_zp (z| Y,\theta_n) \ln\frac{p (Z|\theta) P (y| Z,\theta)}{p (z| Y,\theta_n)}-\sum_z[p (z| Y,\theta_n) \ln P (y|\theta_n)]\\&=\sum_z[p (z| Y,\theta_n) \ln\frac{p (Z|\theta) P (y| Z,\theta)}{p (z| Y,\theta_n)}-p (z| Y,\theta_n) \ln P (y|\theta_n)]\\&=\sum_z[p (z| Y,\theta_n) \ln\frac{p (Z|\theta) P (y| Z,\theta)}{p (y|\theta_n) P (z| Y,\theta_n)}] \end{aligned}$$

Define a function $l (\theta|\theta_n) $:

$ $l (\theta|\theta_n) \triangleq L (\theta_n) +\sum_z[p (z| Y,\theta_n) \ln\frac{p (Z|\theta) P (y| Z,\theta)}{p (y|\theta_n) P (z| Y,\theta_n)}]$$

Thus there is a $L (\theta) \ge L (\theta|\theta_n) $, which means that a lower bound of $L (\theta) $ is "(\theta|\theta_n) $." (In addition, there is an equation $L (\theta_n) =l (\theta_n|\theta_n) $ established. ）

Our aim is to make the $L (\theta_{n+1}) >l (\theta_n) $, while the lower bound of $L (\theta) $ is $l (\theta|\theta_n) $.

Therefore, any $\theta$ that can make the $l (\theta|\theta_n) $ increase can also allow $L (\theta) $ increase.

In other words, theem algorithm indirectly optimizes the logarithmic likelihood by optimizing the lower bound of the logarithmic likelihood.

So, how to optimize its nether?

EM Algorithm Learning notes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

EM Algorithm Learning notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

EM Algorithm Learning notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support