A simple and easy-to-learn machine learning algorithm--EM algorithm

Last Update:2015-06-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The problem of parameter estimation in machine learning

In the previous blog post, such as the "easy-to-learn machine learning algorithm--logistic regression", the maximum likelihood function is used to estimate the parameters of the model, in a nutshell, the logistic regression problem belongs to the supervised learning problem in a series of samples, and the sample contains the training characteristics and tags. In the parametric solution of logistic regression, by constructing the probability that the samples belong to categories and categories:

This gives the logistic regression a probability function that belongs to different classes:

At this point, the parameters in the model can be estimated using the maximum likelihood estimation. However, if the label is unknown at this time, known as implicit variables, such as unsupervised learning problems, typically such as the K-means clustering algorithm, the parameters in the model can not be estimated directly by the maximum likelihood estimation.

Ii. Introduction to EM algorithm

In the problem of implicit variables, the parameters in the model can not be obtained directly by the maximum likelihood estimation, and the EM algorithm is an effective method to solve the problem of implicit variable optimization. The EM algorithm is the short name of the desired maximal (expectation maximization) algorithm, and the EM algorithm is an iterative algorithm, which is divided into two main steps in each iterative process: the desired (expectation) step and the maximization (maximization) Steps.

Preparation for derivation of EM algorithm 1, convex function

Set is a function defined on a real field, if for any real number, there is

Then the convex function. If it is not a single real number, but a vector of real numbers, if the Hesse matrix of the function is semi-positive, that is

Then the convex function. In particular, if or, it is called a strict convex function.

2. Jensen Inequalities

If the function is a convex function and is a random variable, then

In particular, if the function is a strictly convex function, then if and only if the

That is, a random variable is a constant.

(Image from reference article 1)

Note: If the function is a concave function, the above symbol is reversed.

3. Expectation of mathematical expectation 3.1 random variables

The probability distributions of discrete random variables are:

where, if absolute convergence, is called the mathematical expectation, which is recorded as:

If the probability density function of a continuous random variable is, then the mathematical expectation is:

3.2 Mathematical expectation of random variable function

is a function of a random variable, that is, if the discrete random variable, the probability distribution is:

The

If the continuous random variable, the probability density function is, then

The solution process of the EM algorithm assumes that the observed variable is represented as the latent variable, then the likelihood function for the complete data is, wherein, for the parameter to be estimated, the likelihood function for the complete data is. In order to estimate the parameters, we can estimate them using the method of maximum likelihood estimation for the given observation data. Because the variables are unknown, we can only use the likelihood function for maximum likelihood estimation, that is, the need to be maximal: The above equation cannot be directly to the maximum value, because there is an implicit variable in the function, that is, the unknown variable. If at this time, we can determine the value of the hidden variable, we can find the maximum value, you may use the constant modification of the value of the hidden variable, to obtain a new maximum value. This is the idea of the EM algorithm. The parameters are obtained by iterative means. First, we need to assign the initial value to the parameters, iterative operation, assuming that the values of the second iteration of the parameter is, at this time the log likelihood function is, namely: in the formula, the second line to the third exercise used the Jensen inequality, because the log function is a concave function, by the Jensen inequality is obtained: and the expectation is that, Represents some kind of distribution that the implicit variable satisfies. Thus, the value of the upper formula depends on the probability of the and. In the iterative process, the two probabilities are adjusted so that the nether continues to rise, so that the maximum value can be obtained. Note that when the equation is set up, the description is now equivalent to. By the Jensen inequality, the condition of the equation is that the random variable is constant, i.e.: known: so: So far, we have obtained the form of the distribution that the implicit variable satisfies. This is the e-step in the EM algorithm. After the determination, the adjustment parameters make great, this is M step. The steps of the EM algorithm are:

Initializes the parameters and begins the iteration;
E-Step: assumed to be the estimated value of the second iteration parameter, in the first iteration, the calculation:
M step: An estimate of the parameter that is to be made to be maximal, to determine the number of times:

The convergence of the EM algorithm guarantees that the iterative process can guarantee that the maximum likelihood function value is finally found. It is necessary to prove that the maximum likelihood estimation is monotonically increasing during the whole iterative process. The assumptions and the results after the first and second iterations of the EM algorithm, selected, are iterated:

E-Step:
M Step:

Fixed, will be considered as variables: in the above, the first is greater than or equal to because: six, the use of EM algorithm parameters to solve the case

Suppose there is a batch of data that is composed of two normal distributions:

Produced, among them, and unknown,. But do not know the specific is the first generation, that is, can be used and expressed. This is a typical example of a hidden variable, the hidden variable is the and. The parameters can be estimated using the EM algorithm.

The first is initialization and;
Step e: The probability that the data is generated by the first distribution:
M-Step: Calculates the maximum expected value. However, the parameters we require are mean values that can be estimated in the following ways:

Python code

[Python]View Plaincopy

#coding: UTF-8
"' "
Created on June 7, 2015
@author: Zhaozhiyong
‘‘‘
From __future__ Import Division
From NumPy Import *
Import Math as MT
#首先生成一些用于测试的样本
#指定两个高斯分布的参数, the two Gaussian distributions have the same variance
Sigma = 6
Miu_1 =
Miu_2 =
#随机均匀选择两个高斯分布, for generating sample values
N =
X = Zeros ((1, N))
For I in xrange (N):
if Random.random () > 0.5:#使用的是numpy模块中的random
x[0, I] = RANDOM.RANDN () * sigma + miu_1
Else:
x[0, I] = RANDOM.RANDN () * sigma + miu_2
#上述步骤已经生成样本
#对生成的样本, the mean value of Miu is calculated using the EM algorithm
#取miu的初始值
K = 2
Miu = Random.random ((1, k))
#miu = Mat ([40.0, 20.0])
expectations = Zeros ((N, K))
For step in xrange (+):#设置迭代次数
#步骤1, calculate expectations
For i in xrange (N):
#计算分母
Denominator = 0
For J in Xrange (k):
Denominator = denominator + mt.exp (-1/(2 * Sigma * * 2) * (x[0, I]-miu[0, J]) * * 2)
#计算分子
For J in Xrange (k):
Numerator = Mt.exp (-1/(2 * Sigma * * 2) * (x[0, I]-miu[0, J]) * * 2) /c0>
Expectations[i, j] = Numerator/denominator
#步骤2, the most desired
#oldMiu = Miu
Oldmiu = Zeros ((1, k))
For J in Xrange (k):
oldmiu[0, j] = miu[0, J]
Numerator = 0
Denominator = 0
For i in xrange (N):
Numerator = numerator + Expectations[i, j] * x[0, I]
Denominator = denominator + Expectations[i, j]
miu[0, j] = Numerator/denominator
#判断是否满足要求
Epsilon = 0.0001
If SUM (ABS (MIU-OLDMIU)) < epsilon:
Break
Print Step
Print Miu
Print Miu

Final result

[[40.49487592 19.96497512]]

Reference article:

1, (EM algorithm) The EM Algorithm (http://www.cnblogs.com/jerrylead/archive/2011/04/06/2006936.html)

2. Mathematical expectation (Http://wenku.baidu.com/view/915a9c1ec5da50e2524d7f08.html?re=view)

A simple and easy-to-learn machine learning algorithm--EM algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A simple and easy-to-learn machine learning algorithm--EM algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A simple and easy-to-learn machine learning algorithm--EM algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support