The 9th Chapter EM algorithm and its generalization EM algorithm is an iterative algorithm, which is used for maximum likelihood estimation of probabilistic model parameters with implicit variables (hidden variable), or maximal posteriori probability estimation. Each iteration of the EM algorithm consists of two steps: E-step, desired (expectation), M-Step, Max (maximization), so this algorithm is called the desired maximal algorithm (expectation maximization algorithm), EM algorithm for short.
Introduction of 9.1 em algorithmIn general, the data of the random variable is represented by Y, and z denotes the data of the hidden random variable. Y and Z are collectively called full data (Complete-data), and observational data y is also known as incomplete data (Incomplete-data). Given the observed data y, whose probability distribution is P (y | theta), where Theta is the model parameter that needs to be estimated, then the likelihood function of incomplete data y is P (y | theta), logarithmic likelihood L (theta) =log P (y | theta), and the Union of Y and Z is assumed The rate distribution is P (Y, z}, then the log likelihood function for complete data is log p (y, Z | theta). The likelihood function of the observed data is
The EM algorithm is used to iterate the maximum likelihood estimation of L (theta) =log P (Y | theta). Each iteration consists of a two-step e-step, an expectation; M-step, and a maximum.
Define 9.1 (q function)Log-likelihood function for complete data log p (y, Z | theta) for conditional probability distributions of non-observed data Z (z |) for the given observed data Y and the current parameter theta (i) Y, Theta (i)) is called the Q function, i.e.
em algorithm Description:Step (1) The initial value of the parameter can be arbitrarily selected. It should be noted that the EM algorithm is sensitive to the initial value. Step (2) e-step Q (Theta, theta (i)). In the Q function, Z is an unobserved data, and Y is the observed data. Note that the 1th variable of Q (theta, theta (i)) theta represents the parameter to be greatly initialized, and the 2nd variable theta (i) represents the current estimate of the parameter. Each iteration actually asks for the Q function and its great. Step (3) m step Q (Theta, theta (i)) of the Maxima, get Theta (i+1), complete one iteration theta (i)-->theta (i+1). Each iteration is then shown to increase the likelihood function or to reach the local extremum. The Step (4) gives the condition for stopping the iteration, which is generally a small positive number, and if satisfied, stops the iteration.
the export of EM algorithm
The EM algorithm is derived by approximate solving the problem of maximal logarithm likelihood function of the observed data, so that the action of EM algorithm can be clearly seen. In the face of a probabilistic model with implicit variables, the goal is to greatly visualize the data (incomplete data) y about the logarithmic likelihood function of the parameter theta, that is, the main difficulty of the maxima of the maxima is that there is no observed data in the formula and there is a logarithm of the inclusion and (or integral). The EM algorithm is progressively approximated by an iterative approach to maximum L (theta). Each iteration needs to be satisfied: The new estimate theta can increase L (theta) and gradually reach the maximum value. The difference before and after the I iteration is:
The Jensen inequality can be used to derive the lower bound to make L (theta) Great, the choice of Theta (i+1) to make B great, and the equivalent of an iteration of the EM algorithm, that is, the Q function and its maximization. The EM algorithm is an algorithm for solving the maximum logarithm likelihood function by solving the maximal approximation of the nether. the intuitive interpretation of the EM algorithm: the upper curve is L (theta), the lower curve is B (theta, theta (i)), is the nether of the logarithmic likelihood function L (theta), and is equal to Theta= theta (i). The EM algorithm finds the next point Theta (i+1) to make the function B (theta, theta (i)) greater, and also makes the function q (theta, theta (i)) greater. The increase of function B ensures that the logarithmic likelihood function L is also increased in each iteration. The EM algorithm recalculates the value of the Q function at point Theta (i+1) for the next iteration. In this process, the logarithmic likelihood function L increases continuously. It can be inferred from the graph that the EM algorithm cannot guarantee to find the global optimal value.
The application of EM algorithm in unsupervised learningTraining data only input does not have a corresponding output (X,. From such a data learning model is called unsupervised learning problem. The EM algorithm can be used to generate unsupervised learning of the model, which is represented by the joint probability distribution P (X, Y), and it can be considered that unsupervised learning training data is the data generated by the joint probability distribution. X is the observed data and Y is the unobserved data.the convergence of 9.2 em algorithm Theorem 9.1Set P (Y | theta) as the likelihood function of observational data, Theta (i) (I=1, 2,...) The parameter estimation sequence obtained for the EM algorithm, P (Y | Theta (i)) (I=1, 2,...)) For the corresponding likelihood function sequence, P (Y | Theta (i)) is monotonically increasing, i.e.
theorem 9.2Set P (Y | theta) as the likelihood function of observational data, Theta (i) (I=1, 2,...) The parameter estimation sequence obtained for the EM algorithm, L (theta (i)) = P (Y | Theta (i)) (I=1, 2,...)) For the corresponding likelihood function sequence, (1) if P (Y | theta) has an upper bound, then L (Theta (i)) converges to a certain value l*; (2) When the function Q and L satisfy certain conditions, the convergence value of the parameter estimation sequence theta (i) obtained by the EM algorithm theta* is the stable point of L (Theta). The convergence of the EM algorithm includes the convergence of the sequence l of the logarithmic likelihood function and the convergence of the theta of the parameter estimation sequence, the former does not imply the latter. In addition, the theorem can only guarantee that the parameter estimation sequence converges to the stable point of the logarithmic likelihood function sequence, and cannot guarantee the convergence to the maximal value point. So in the application, the selection of the initial value becomes very important, the common method is to select several different initial values to iterate, and then compare the obtained estimates, from which to choose the best.application of 9.3 em algorithm in Gaussian mixture model learning definition 9.2 (Gaussian mixture model)Gaussian mixture model refers to a probability distribution model with the following form: called the K-sub model.
em algorithm for parameter estimation of Gaussian mixture Model hypothesis observation data generated by Gaussian mixture model, 1. Explicit implicit variables. The log-likelihood function of the complete data is written to conceive that the observed data Y J is produced by selecting the K-Gaussian distribution model according to probability a K first, and then generating the view-side data y J according to the probability distribution of the K-sub model. At this point the observed data y J is known; the data from the K-sub model is unknown, k=1,2,..., K, and the implicit variable is defined as follows:
is a 0-1 random variable. Then the full data is the likelihood function of the full data: The logarithmic likelihood function is: 2. The e-step of the  EM algorithm: Determine the Q function
where. is the probability of the first J observation data from the K-sub model under the current model parameter, which is called the response degree of the fractal Model K to the observed data y J. 3. The M-step of determining the M-Step iteration of the EM algorithm is to find the maximal value of the function Q to the theta, that is, the model parameters of the new iteration are obtained by means of a biased guide and made to 0 and the constraints are available, 9.4 The generalization of the EM algorithm EM algorithm can also be interpreted as the F function (f functions) of the maximal-maximal algorithm (maximization-maximization algorithm), based on this interpretation has a number of deformation and generalization, such as generalized expectation maxima (generalized expectation maximization, gem) algorithm.
The maximum-maximum algorithm of the
f function defines the 9.3 (f function) assumes that the probability distribution of the implicit variable data z is P ~ (z), defines the distribution p ~ and The function f ( p ~ , theta) of the parameter Theta
becomes the F function, where H is the entropy of the distribution P ~ (Z). lemma 9.1 for fixed theta, there is a unique distribution P ~ (theta) of the maximum F, when P ~ (theta) by the following