EM Algorithm general description:
When part of the data is missing or cannot be observed, the EM algorithm provides an efficient iterative procedure to calculate the maximum likelihood estimate for these data. The iteration in each step is divided into two steps: the expected (expectation) step and the Maximize (maximization) step, so called the EM algorithm.
Assuming that all data Z is composed of observable samples x={x1, X2, ..., Xn} and non-observable samples z={z1, Z2, ..., Zn}, then y = x∪z. The EM algorithm searches for maximum likelihood estimates by searching for the highest expected value of the likelihood function log (L (Z; h)) of all data, noting that the h here is not a variable but a set of parameters consisting of multiple variables. This expectation is calculated on the probability distribution followed by Z, which is determined by the unknown parameter H. However, the distribution that z follows is unknown. The EM algorithm uses its current hypothesis H ' instead of the actual parameter H to estimate the distribution of Z.
Q (h ' | h) = E [ln P (y|h ') | h, X]
The EM algorithm repeats the following two steps until convergence.
Step 1: Estimate (E) Step: Use the current hypothesis H and the observed data x to estimate the probability distribution on y to calculate Q (h ' | h).
Q (h ' | h) ←e[ ln P (y|h ') | h, X ]
Step 2: Maximize (M) Step: Replace the assumption H with the assumption H ' that maximizes the Q function:
H ←argmaxq (h ' | h)
parameter estimation of Gaussian mixture model: for the sake of simplicity, this problem studies two Gaussian mixed model parameter estimation k=2.
Problem Description: Suppose X is a homogeneous mixture of k Gaussian distributions, the mean values of K Gaussian distributions are different, but with the same variance. Set the sample value to X1, x2, ..., xn,xi can be represented as a k+1 tuple < XI, zi1, Zi2, ..., Zik>, where only one takes 1 and the rest is 0. Here the ZI1 to Zik as a hidden variable, is unknown. and the probability of any zij being chosen is equal, i.e.
P (zij = 1) =1/k (J=1,2,3.....K)The EM algorithm solution is deduced as follows:
Python Implementation ( simulates the mean estimate of 2 normal distributions ):
#coding: Gbkimport mathimport copyimport numpy as Npimport matplotlib.pyplot as Pltisdebug = false# Specifies k Gaussian distribution parameters, where k=2 is specified. Note that the 2 Gaussian distributions have the same mean Variance sigma, respectively, MU1,MU2. def ini_data (sigma,mu1,mu2,k,n): Global x Global MU global expectations x = Np.zeros ((1,n)) Mu = Np.random. Random (2) expectations = Np.zeros ((n,k)) for I in Xrange (0,n): If Np.random.random (1) > 0.5:x[ 0,i] = Np.random.normal () *sigma + Mu1 Else:x[0,i] = Np.random.normal () *sigma + Mu2 if Isdebug: Print "***********" Print U "initial observation data X:" Print x# EM algorithm: Step 1, calculate E[zij]def e_step (sigma,k,n): Global Expectati ONS global Mu Global X for I in Xrange (0,n): denom = 0 for J in Xrange (0,k): Denom + = ma Th.exp (( -1/(float (sigma**2))) * (float (x[0,i]-mu[j))) **2) for J in Xrange (0,k): Numer = Math.exp (( -1/( (Float (sigma**2))) * (float (x[0,i]-mu[j))) **2) Expectations[i,j] = Numer/denom if Isdebug:priNT "***********" Print U "hidden variable E (Z):" Print expectations# EM algorithm: Step 2, maximize E[zij] Parameters mudef M_step (k,n): Global E Xpectations Global X for J in Xrange (0,k): Numer = 0 denom = 0 for I in Xrange (0,n): Numer + = Expectations[i,j]*x[0,i] denom +=expectations[i,j] mu[j] = numer/denom # algorithm iterates iter_num times, or achieves precision Epsilon Stop Iteration def Run (Sigma,mu1,mu2,k,n,iter_num,epsilon): Ini_data (sigma,mu1,mu2,k,n) print U "initial <u1,u2>:", Mu For I in Range (iter_num): Old_mu = Copy.deepcopy (Mu) e_step (sigma,k,n) m_step (k,n) Print I,mu If SUM (ABS (MU-OLD_MU)) < epsilon:breakif __name__ = = ' __main__ ': Run (6,40,20,2,1000,1000,0.0001) Plt.hist (x[0,:],50) plt.show ()
This code simulates the mean estimate of a k=2 normal distribution. where the Ini_data (sigma,mu1,mu2,k,n) function is used to generate training samples, this training sample is randomly generated from two Gaussian distributions, where Gaussian distribution a mean mu1=40, mean variance sigma=6, Gaussian distribution B mean mu2=20, The average variance is sigma=6, and the resulting sample distribution is shown. Due to the fact that two Gaussian distribution parameters cannot be learned directly from the data in this problem, it is necessary to use EM algorithm to estimate the specific Mu1 and MU2 values.
Figure 1 Sample Data distribution
in the sample data in Figure 1 , at the 11th step, the iteration terminates, and the EM estimate results are:
mu=[40.55261688 19.34252468]
attached:
Maximum likelihood estimation
References: machine learning Tomm.mitchell p.137