--python realization of parameter estimation of Gaussian mixture model by EM algorithm

Last Update:2014-08-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

EM Algorithm general description:

When part of the data is missing or cannot be observed, the EM algorithm provides an efficient iterative procedure to calculate the maximum likelihood estimate for these data. The iteration in each step is divided into two steps: the expected (expectation) step and the Maximize (maximization) step, so called the EM algorithm.

Assuming that all data Z is composed of observable samples x={x1, X2, ..., Xn} and non-observable samples z={z1, Z2, ..., Zn}, then y = x∪z. The EM algorithm searches for maximum likelihood estimates by searching for the highest expected value of the likelihood function log (L (Z; h)) of all data, noting that the h here is not a variable but a set of parameters consisting of multiple variables. This expectation is calculated on the probability distribution followed by Z, which is determined by the unknown parameter H. However, the distribution that z follows is unknown. The EM algorithm uses its current hypothesis H ' instead of the actual parameter H to estimate the distribution of Z.

Q (h ' | h) = E [ln P (y|h ') | h, X]

The EM algorithm repeats the following two steps until convergence.

Step 1: Estimate (E) Step: Use the current hypothesis H and the observed data x to estimate the probability distribution on y to calculate Q (h ' | h).

Q (h ' | h) ←e[ ln P (y|h ') | h, X ]

Step 2: Maximize (M) Step: Replace the assumption H with the assumption H ' that maximizes the Q function:

H ←argmaxq (h ' | h)

parameter estimation of Gaussian mixture model: for the sake of simplicity, this problem studies two Gaussian mixed model parameter estimation k=2.
Problem Description: Suppose X is a homogeneous mixture of k Gaussian distributions, the mean values of K Gaussian distributions are different, but with the same variance. Set the sample value to X1, x2, ..., xn,xi can be represented as a k+1 tuple < XI, zi1, Zi2, ..., Zik>, where only one takes 1 and the rest is 0. Here the ZI1 to Zik as a hidden variable, is unknown. and the probability of any zij being chosen is equal, i.e.
P (zij = 1) =1/k (J=1,2,3.....K)The EM algorithm solution is deduced as follows:
Python Implementation ( simulates the mean estimate of 2 normal distributions ):

#coding: Gbkimport mathimport copyimport numpy as Npimport matplotlib.pyplot as Pltisdebug = false# Specifies k Gaussian distribution parameters, where k=2 is specified. Note that the 2 Gaussian distributions have the same mean Variance sigma, respectively, MU1,MU2. def ini_data (sigma,mu1,mu2,k,n): Global x Global MU global expectations x = Np.zeros ((1,n)) Mu = Np.random. Random (2) expectations = Np.zeros ((n,k)) for I in Xrange (0,n): If Np.random.random (1) > 0.5:x[        0,i] = Np.random.normal () *sigma + Mu1 Else:x[0,i] = Np.random.normal () *sigma + Mu2 if Isdebug: Print "***********" Print U "initial observation data X:" Print x# EM algorithm: Step 1, calculate E[zij]def e_step (sigma,k,n): Global Expectati ONS global Mu Global X for I in Xrange (0,n): denom = 0 for J in Xrange (0,k): Denom + = ma Th.exp (( -1/(float (sigma**2))) * (float (x[0,i]-mu[j))) **2) for J in Xrange (0,k): Numer = Math.exp (( -1/( (Float (sigma**2))) * (float (x[0,i]-mu[j))) **2) Expectations[i,j] = Numer/denom if Isdebug:priNT "***********" Print U "hidden variable E (Z):" Print expectations# EM algorithm: Step 2, maximize E[zij] Parameters mudef M_step (k,n): Global E            Xpectations Global X for J in Xrange (0,k): Numer = 0 denom = 0 for I in Xrange (0,n): Numer + = Expectations[i,j]*x[0,i] denom +=expectations[i,j] mu[j] = numer/denom # algorithm iterates iter_num times, or achieves precision    Epsilon Stop Iteration def Run (Sigma,mu1,mu2,k,n,iter_num,epsilon): Ini_data (sigma,mu1,mu2,k,n) print U "initial <u1,u2>:", Mu         For I in Range (iter_num): Old_mu = Copy.deepcopy (Mu) e_step (sigma,k,n) m_step (k,n) Print I,mu   If SUM (ABS (MU-OLD_MU)) < epsilon:breakif __name__ = = ' __main__ ': Run (6,40,20,2,1000,1000,0.0001) Plt.hist (x[0,:],50) plt.show ()

This code simulates the mean estimate of a k=2 normal distribution. where the Ini_data (sigma,mu1,mu2,k,n) function is used to generate training samples, this training sample is randomly generated from two Gaussian distributions, where Gaussian distribution a mean mu1=40, mean variance sigma=6, Gaussian distribution B mean mu2=20, The average variance is sigma=6, and the resulting sample distribution is shown. Due to the fact that two Gaussian distribution parameters cannot be learned directly from the data in this problem, it is necessary to use EM algorithm to estimate the specific Mu1 and MU2 values.

Figure 1 Sample Data distribution

in the sample data in Figure 1 , at the 11th step, the iteration terminates, and the EM estimate results are:

mu=[40.55261688 19.34252468]

attached:

Maximum likelihood estimation

References: machine learning Tomm.mitchell p.137

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

--python realization of parameter estimation of Gaussian mixture model by EM algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

--python realization of parameter estimation of Gaussian mixture model by EM algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support