EM algorithm 1-principle

Last Update:2016-09-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The EM algorithm is used for maximum likelihood estimation of probabilistic model parameters with implied variables. What is the probabilistic model of implied variables? For example, suppose there are 3 coins, which are recorded as a,b,c, and the probability of their appearing on the front is r,p,q respectively. Each experiment first toss a coin, if the appearance is positive on the cast B, if the opposite side of the cast C, the positive is recorded as 1, the negative is recorded as 0. Independent 10 experiments, the results are as follows: 1101001011. If only this result, without knowing the process, asks how to estimate r,q,p? That is, we can see the results of each observation, but this result is produced by B or C, we do not know, that is, the result of a we do not know, this is the so-called implicit variable. If the observed variable is represented by Y, the implied variable (the result of a) is represented by Z, then the likelihood function of the observed data is:

\ (P (Y|\theta) =\prod_i{rp^{y_i} (1-p) ^{1-y_i}+ (1-r) q^{y_i} (1-q) ^{1-y_i}}\)

Generalization of the above model can be summarized as the observed data {\ (x_1,x_2,... x_m\)}, by a model with the observed variable x and the implied variable z, the model parameter is \ (\theta\), we want to maximize the following this likelihood:

\ (L (\theta) =\displaystyle\sum_{i}^{m}logp (X_i;\theta) =\displaystyle\sum_{i}^{m}log\sum_{z_i}p (X_i,z_i;\theta) \ )。

It is very difficult to solve this optimization problem directly. The EM algorithm is solved by iterative method, which is divided into expectation step and maximization step. Its main idea is to find the lower boundary of the target function, then improve the lower boundary gradually, and then get an optimal solution, but the optimal solution is not necessarily the global optimal.

Let's take a look at how the nether is derived---

\ (\displaystyle\sum_{i}^{m}logp (X_i;\theta) \)

\ (=\displaystyle\sum_{i}^{m}log\sum_{z_i}p (X_i,z_i;\theta) \)

For I, suppose \ (q_i\) is a probability distribution on Z

\ (=\displaystyle\sum_{i}^{m}log\sum_{z_i}q_i (z_i) \frac{p (X_i,z_i;\theta)}{q_i (z_i)}\)

\ (>=\displaystyle\sum_{i}^{m}\sum_{z_i}q_i (z_i) log\frac{p (X_i,z_i;\theta)}{q_i (z_i)}\)----(eq1)

This step uses the Jensen inequality because the log function is concave (the second derivative is less than 0), so there is a log (E (x)) >=e (log (x)). and \ (q_i\) is the probability distribution, so you can take the \ (\sum_{z_i}q_i (z_i) \frac{p (X_i,z_i;\theta)}{q_i (z_i)}\) as an expectation, and then apply the Jensen inequality [2], you can get the above results.

Now there is a lower limit, but the qi inside is not known. How to Determine Qi? If we already have a guess value of \ (\theta\), then here naturally let the lower bound at \ (\theta\) value and likelihood function at \ (\theta\) The closer the better, let the inequality eq1 at \ (\theta\) get an equal sign. Because the log function is a strictly concave function, the equals sign will only be formed when e (x) ==x (constant equals), for example, when X is a constant. Based on the nature of the above, make

\ (\frac{p (X_i,z_i;\theta)}{q_i (z_i)}=c\)

Based on this, you can launch

\ (\frac{\sum_zp (X_i,z;\theta)}{\sum_zq_i (z)}=c\) (This is easy to launch, A1/b1=c,a2/b2=c,a3/b3=c = (A2+A2+A3)/(B1+B2+B3) =c)

Yes

\ (Q_i (z_i) =\frac{p (X_i,z_i;\theta)}{\sum_zp (X_i,z;\theta)}\)

\ (=\frac{p (X_i,z_i;\theta)}{p (X_i;\theta)}\)

\ (=p (Z_i|x_i;\theta) \)

Therefore, Qi is a posteriori probability for a given XI and \ (\theta\) Zi.

This is the e-step, summed up, assuming known \ (\theta\), first to find out the lower limit of the likelihood function, and then find the distribution of the implied variable QI.

In the next m step, because the E step has been given a QI, this step is to maximize eq1 's \ (\theta\) value, which is the maximum point for the lower limit.

Then the M-Step (\theta\) to enter the E-step, the cycle, until convergence.

Repeat until convergence{

E-step:for each I,set

\ (Q_i (z_i): =p (Z_i|x_i;\theta) \)

M-step:set

\ (\theta:=argmax_{\theta}\displaystyle\sum_{i}^{m}\sum_{z_i}q_i (z_i) log\frac{p (X_i,z_i|\theta)}{Q_i (z_i)}\)

}

The following picture more visually describes the EM process, the image from the [4],e step to move the lower bound to \ (\theta\) value and the same as the target function, m step to find the maximum value of the Nether function as the new \ (\theta\)

If definition \ (J (Q,\theta) =\displaystyle\sum_{i}^{m}\sum_{z_i}q_i (z_i) log\frac{p (X_i,z_i;\theta)}{q_i (z_i)}\)

Then, the EM algorithm can be regarded as the axis descent process of function J, and the E-step maximizes q,m (\theta\).

The EM algorithm is convergent, and the specific proofs refer to [3], but the EM algorithm is likely to fall into the local optimal, which is sensitive to the initial value.

The following is an attempt to solve the three-coin problem of the article using the EM algorithm.

Assuming that the J-step iteration has passed, there is now \ (\theta^j= (r^j,c^j,q^j) \) (to avoid the confusion of writing, the positive probability p of B inside the parameter is changed to C)

E-Step:

Here requirements \ (P (z_i|x_i;\theta^j) \), because it is a binary problem, in order to describe the simple, can directly seek positive probability, according to Bayesian probability formula:

(In order to write a simple, the following is the number of iterations of the upper corner Mark J removed, remember, r,c,q is known)

\ (P (Z_i=1|x_i;\theta) =\frac{p (X_i|z_i=1;\theta) p (Z_i=1;\theta)}{p (X_i|z_i=1;\theta) p (Z_i=1;\theta) +p (z_i=0;\ Theta) P (x_i|z_i=0;\theta)}\)

\ (=\frac{rc^{x_i} (1-c) ^{(1-x_i)}}{rc^{x_i} (1-c) ^{1-x_i}+ (1-r) q^{x_i} (1-q) ^{(1-x_i)}}\)

Put \ (P (Z_i=1|x_i;\theta) \) do \ (\mu^{(j+1)}\) is the j+1 iteration of the value obtained, in order to write clearly (Cnblog to the formula to support some of the bad AH), or to remove the upper corner mark

M Step:

Now \ (P (Z_i|x_i;\theta) \) already know, start to solve the following optimization problem

\ (J (\theta) =\sum\mu_ilog\frac{p (X_i,z_i=1;\theta)}{\mu_i}+ (1-\mu_i) log\frac{p (X_i,z_i=0;\theta)}{1-\mu_i}\)

\ (=\sum\mu_ilog\frac{rc^{x_i} (1-c) ^{(1-x_i)}}{\mu_i}+ (1-\mu_i) log\frac{(1-r) q^{x_i} (1-q) ^{(1-x_i)}}{1-\mu_i}\)

Order \ (\frac{\partial J (\theta)}{r}=0\)

It's easy to get \ (r=\frac{1}{m}\sum\mu_i\)

Order \ (\frac{\partial J (\theta)}{c}=0\)

Equally easy to get \ (c=\frac{\sum\mu_ix_i}{\sum\mu_i}\)

Order \ (\frac{\partial J (\theta)}{q}=0\)

Equally easy to get \ (C=\frac{\sum (1-\mu_i) x_i}{\sum (1-\mu_i)}\) reference [1][5]

Reference:

[1] Hangyuan Li "Statistical Learning method"

[2] Jensen Inequalities: http://www.cnblogs.com/naniJser/p/5642288.html

[3] Andrew Ng's Lecture Notes on machine learning courses: http://cs229.stanford.edu/notes/cs229-notes8.pdf

[4] Introduction of the EM algorithm blog:http://blog.csdn.net/zouxy09/article/details/8537620

[5]http://chenrudan.github.io/blog/2015/12/02/emexample.html

EM algorithm 1-principle

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

EM algorithm 1-principle

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support