Excerpt from: https://www.zhihu.com/question/27976634
Briefly, why use the EM algorithm
Now there are 50 boys in a class, 50 girls, and boys stand left, girls stand right. We assume that the height of a boy is normally distributed and that the height of a girl is subject to another normal distribution:. At this time we can use the maximum likelihood (MLE), respectively, the 50 boys and 50 female samples to estimate the two normal distribution parameters.
But now we're making things a little more complicated, and that's 50 boys and 50 girls mixed up. We have 100 person height data, but do not know these 100 people each is a boy or a girl.
It's a little awkward at this point, because normally we only know the exact normal distribution parameters of men and women's height and we know that everyone is more likely to be a boy or a girl. But on the other hand, we only know if everyone is a boy or a girl to estimate the parameters of the normal distribution of men and women as accurately as possible.
This time someone thought we had to start at a certain point and use an iterative approach to solve the problem: we first set a few parameters (initial values) for the height distribution of the male and female, and then according to these parameters to determine whether each sample (person) is a male or female, and then re-estimate the parameters according to the sample after the label. Then repeat the process several times until it is stable. This algorithm is also the EM algorithm.
Why use an EM algorithm?
In general, we want to use a maximum likelihood method to find (MLE) a maximum likelihood probability, then the problem is that the MLE of the original function may not be able to find (the function is too complex, the data is missing, etc.). Since the data is missing and cannot be used directly with the Mle method, we can replace the missing data with the expected value of the missing data, which is related to the probability distribution of the missing data. So we can approximate the maximum value of the original function by maximizing the likelihood function's expectation of missing data (mathematical proof is complex), so the two steps of EM are also obvious.
Push a nature Biotech's EM tutorial article, using an example of a coin toss to think of the EM algorithm.
Do
, C. B, & Batzoglou, S. (2008). What's the expectation maximization algorithm?
Nature Biotechnology,
(8), 897.
Now there are two coins A and b, and the parameters to be estimated are the probabilities of their respective heads (head). The process of observation is to randomly select a or B and throw it 10 times. The above steps are repeated 5 times.
If you know whether a or B is selected each time, you can estimate it directly (see a). If you do not know whether a or B (hidden variable) is selected, only 5 cycles of 50 coins are observed, then the positive probability of a and B can not be directly estimated. The EM algorithm can act at this point (see B).
Recommended to read the original text, no complex mathematical formula, easy to understand.
Excerpt from: http://blog.csdn.net/zouxy09/article/details/8537620
An alternative understanding of EM algorithm
Coordinate ascent method (coordinate ascent):
The path of the straight-line iterative optimization in the figure, you can see that each step will be further ahead of the optimal value, and that the forward route is parallel to the axis, because each step only optimizes one variable.
This is like finding the extremum of a curve in the X-y coordinate system, but the curve function cannot be directly derivative, so what gradient descent method does not apply. However, after one variable is fixed, the other one can be obtained by derivation, so we can use the coordinate rising method to fix one variable at a time, to find the other extremum, and then to approximate the extremum gradually. Corresponds to EM,e-step: fix θ, optimize q,m step: fix Q, optimize θ, alternately push the extremum to maximum.
EM algorithm-When there is an implied variable, the maximum likelihood of using the gradient method can be used to guess the value of the implied variable to find Max values