EM algorithm (two)-an approach to the algorithm

Source: Internet
Author: User
Introduction of EM algorithm

In one of the EM algorithm-the problem is introduced in the issue of the coin, the object function of the model, mentioned that the maximum likelihood estimation of the implicit variable to be solved with the EM algorithm, and then listed the EM algorithm of the simple process, of course, the last to see the EM algorithm at the heart is Meng Circle, we also briefly analyzed a bit, I hope you have read the previous article, you can probably know the purpose and function of e-step and M-step. To deepen our understanding, we look back at a brief introduction to the EM algorithm:

Input: Observation variable data Y, implicit variable data Z, joint distribution $p (Y,z|\theta) $, conditional distribution $p (z| Y,\theta) $
Output: Model parameter $\theta$
(1) Select the parameter initial value $\theta^{(0)}$, to iterate;
(2) E-Step: note $\theta^{(i)}$ for the first iteration of the parameter $\theta$ estimates, in the i+1 iteration of the e-step, calculated:

$ $Q (\theta,\theta^{(i)}) =e_z[logp (Y,z | \theta) |\color{red}{y,\theta^{(i)}}]\ =\sum_z{[logp (Y,z|\theta)]\color{ Red}{[p (z| y,\theta^{(i)})]}} \tag{1}$$

(3) M Step: Make $q (\theta,\theta^{(i)}) $ $\theta$, determine the parameter estimates for i+1 iterations $\theta^{(i+1)}$

$$\theta^{(i+1)}=argmax_\theta Q (\theta,\theta^{(i)}) \tag{2}$$

(4) Repeat step (2) and step (3) until the convergence

the function $q (\theta,\theta^{(i)}) in the E-step above is the core of the EM algorithm , called the Q function.
The Q function is the log-likelihood function of the complete data $logP (y,z | \theta) $ about the observed data $y$ and the current parameter $\theta^{(i)}$, for the unobserved data Z $P of conditional probability distributions (z| y,\theta^{(i)}) $ expectation.
Let's wait and see the Q function, where there are a lot of key words. First of all, it is clear that the Q function is an expectation, and this is not a problem; second, this expectation is the expectation of a function (a logarithmic likelihood function under full data) about a probability distribution (conditional probability distribution of unobserved data z under XXX conditions). As you can read here, you may not understand the expectations of a function about a probability distribution . I am in this to insert a small episode introduced, understand can skip:

Knowledge Point one: conditional mathematical expectation

The function above involves the expectation of a probability distribution , called conditional mathematical expectation in mathematics.
First, the conditional probabilities we are already familiar with, is the probability that the event ${y=y_j}$ occurs when the event $ {x=x_i}$ has occurred, as $p {y=y_j| X=x_i} $;
The conditional expectation is the expected value of a real random variable relative to a conditional probability distribution. Set X and Y are discrete random variables, then the condition of X is expected to be a function of X's range of values in Y under the given event y = y condition:

(3)

Personal feelings can be understood as weighted averages under each conditional probability distribution.

So continue to understand the Q function, see the formula in E Step (1), the function $logp (y,z| \theta) $ is about Z, and in the $y,\theta^{(i)}$ condition refers to the implied variable Z under this condition, that is, in the probability distribution $p (z| y,\theta^{(i)}) $ condition, so the deformation of the red part of Equation 1 is well understood. The logarithmic likelihood function $logp (y,z| \theta) $ is the logarithmic likelihood function of the complete data, which has an implicit variable z, so it is necessary to add the conditional probability distribution of Z to the conditional mathematical expectation of Z in this function.
After obtaining the conditional mathematical expectation of the implied variables in e-step , we have to take a value to get the model parameter $\theta$ so that the value of Q function is maximal (the maximum likelihood estimation derivative). So, in m step , for $q (\theta,\theta^{(i)}) $ Max, get $\theta^{(i+1)}$, complete one iteration $\theta^{(i)} \to \theta^{(i+1)}$, We later prove that each iteration is bound to increase the value of the Q function or to achieve local optimality (the second part provides proof). Finally, the stop iteration condition is generally required to set a relatively small value $\epsilon_1,\epsilon_2$, if $| is satisfied | \theta^{(i+1)}-\theta^{(i)}| | <\epsilon_1$ or $| | Q (\theta^{(i+1)},\theta^{(i+1)})-Q (\theta^{(i)},\theta^{(i)}) | | <\epsilon_2$.

Two, EM algorithm export

Why can the EM algorithm approximate the maximum likelihood estimate of the observed data ? We face a probabilistic model with implicit variables, the goal is to maximize the observed data (incomplete data) y about the parameter $\theta$ logarithmic likelihood function, namely maximization:

$ $L (\theta) =logp (Y|\theta) =log\sum_z{p (Y,z|\theta)} =log (\sum_z{p (y| Z,\theta) P (Z|\theta)}) \tag{4}$$

The difficulty with this equation is that the equation (4) contains the non-observed data, and contains the logarithm (or integral).
The EM algorithm is progressively approximated by iteration (\theta) $. This assumes that the estimated value of $\theta$ after this iteration is $\theta^{(i)}$, then we calculate whether the new estimate $\theta$ can make the (\theta) $ increase, i.e. (\theta) >l (\theta^{(i)}) $, and gradually reach the maximum value? So we consider the difference between the two:

$ $L (\theta)-L (\theta^{(i)}) =log \left (\sum_z{p (y| Z,\theta) P (Z|\theta)} \right)-logp (y|\theta^{(i)}) \tag{5}$$

For the formula (5) We need a variant, but the deformation needs to know Jensen inequality.

Knowledge Point two: Jensen inequality (Johnson inequality)

$ $log \sum_{j}{\lambda_{j}y_j} \ge \sum_j{\lambda_j logy_j}$$, where $\lambda_j \ge 0, \sum_j{\lambda_j = 1}$

To understand a little bit about the Jensen inequality, we continue to look at the formula (5), first transforming the formula (5), and multiplying the numerator denominator by a $\color{blue}{p in the first part (y| z,\theta^{(i)})}$, for clarity, we mark the blue and the brackets as follows:

Here we make

$ $B (\theta,\theta^{(i)}) = L (\theta^{(i)}) + \sum_z \left[\color{blue}{p (z| y,\theta^{(i)})} Log \left (\frac{p (y| Z,\theta) P (Z|\theta)}{\color{blue}{p (z| y,\theta^{(i)})}\color{forestgreen}{p (y|\theta^{(i)})} \right) \right] \tag{7}$$

You can get:

$ $L (\theta) \ge B (\theta,\theta^{(i)}) \tag{8}$$

You can know that the $b (\theta,\theta^{(i)}) $ function is a lower bound of (\theta) $, and is known by the formula (7):

$ $L (\theta^{(i)}) = B (\theta^{(i)},\theta^{(i)}) $$

Therefore, any $\theta$ that can make $b (\theta,\theta^{(i)}) $ increase can also make the (\theta) $ increase. In order for the "(\theta) $ to have as large a growth as possible, choose $\theta^{(i+1)}$ make $b (\theta,\theta^{(i)}) $ reach great, namely:

$$\theta^{(i+1)}=argmax_\theta B (\theta,\theta^{(i)}) \tag{9}$$

Now seek $\theta^{(i+1)}$, omitting the const term:

The equation (10) is equivalent to the one iteration of the EM algorithm, that is, the Q function and its maximization. So, we see that the EM algorithm is the algorithm to solve the logarithm likelihood function maximization by solving the maximal approximation of the nether. is the constant construction of a local lower bound, which is then further solved.

Three, EM algorithm application

The

EM algorithm has many applications, such as classification, regression, labeling and other tasks. The more extensive is GMM mixed Gaussian model, hmm hidden Markov training problem and so on.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.