A simple and easy-to-learn machine learning algorithm--EM algorithm
The problem of parameter estimation in machine learning
In the previous blog post, such as the "easy-to-learn machine learning algorithm--logistic regression", the maximum likelihood function is used to estimate the parameters of the model, in a nutshell, the logistic regression problem belongs to the supervised learning problem in a series of samples, and the sample contains the training characteristics and tags. In the parametric solution of logistic regression, by constructing the probability that the samples belong to categories and categories:
This gives the logistic regression a probability function that belongs to different classes:
At this point, the parameters in the model can be estimated using the maximum likelihood estimation. However, if the label is unknown at this time, known as implicit variables, such as unsupervised learning problems, typically such as the K-means clustering algorithm, the parameters in the model can not be estimated directly by the maximum likelihood estimation.
Ii. Introduction to EM algorithm
In the problem of implicit variables, the parameters in the model can not be obtained directly by the maximum likelihood estimation, and the EM algorithm is an effective method to solve the problem of implicit variable optimization. The EM algorithm is the short name of the desired maximal (expectation maximization) algorithm, and the EM algorithm is an iterative algorithm, which is divided into two main steps in each iterative process: the desired (expectation) step and the maximization (maximization) Steps.
Preparation for derivation of EM algorithm 1, convex function
Set is a function defined on a real field, if for any real number, there is
Then the convex function. If it is not a single real number, but a vector of real numbers, if the Hesse matrix of the function is semi-positive, that is
Then the convex function. In particular, if or, it is called a strict convex function.
2. Jensen Inequalities
If the function is a convex function and is a random variable, then
In particular, if the function is a strictly convex function, then if and only if the
That is, a random variable is a constant.
(Image from reference article 1)
Note: If the function is a concave function, the above symbol is reversed.
3. Expectation of mathematical expectation 3.1 random variables
The probability distributions of discrete random variables are:
where, if absolute convergence, is called the mathematical expectation, which is recorded as:
If the probability density function of a continuous random variable is, then the mathematical expectation is:
3.2 Mathematical expectation of random variable function
is a function of a random variable, that is, if the discrete random variable, the probability distribution is:
The
If the continuous random variable, the probability density function is, then
The solution process of the EM algorithm assumes that the observed variable is represented as the latent variable, then the likelihood function for the complete data is, wherein, for the parameter to be estimated, the likelihood function for the complete data is. In order to estimate the parameters, we can estimate them using the method of maximum likelihood estimation for the given observation data. Because the variables are unknown, we can only use the likelihood function for maximum likelihood estimation, that is, the need to be maximal: The above equation cannot be directly to the maximum value, because there is an implicit variable in the function, that is, the unknown variable. If at this time, we can determine the value of the hidden variable, we can find the maximum value, you may use the constant modification of the value of the hidden variable, to obtain a new maximum value. This is the idea of the EM algorithm. The parameters are obtained by iterative means. First, we need to assign the initial value to the parameters, iterative operation, assuming that the values of the second iteration of the parameter is, at this time the log likelihood function is, namely: in the formula, the second line to the third exercise used the Jensen inequality, because the log function is a concave function, by the Jensen inequality is obtained: and the expectation is that, Represents some kind of distribution that the implicit variable satisfies. Thus, the value of the upper formula depends on the probability of the and. In the iterative process, the two probabilities are adjusted so that the nether continues to rise, so that the maximum value can be obtained. Note that when the equation is set up, the description is now equivalent to. By the Jensen inequality, the condition of the equation is that the random variable is constant, i.e.: known: so: So far, we have obtained the form of the distribution that the implicit variable satisfies. This is the e-step in the EM algorithm. After the determination, the adjustment parameters make great, this is M step. The steps of the EM algorithm are:
- Initializes the parameters and begins the iteration;
- E-Step: assumed to be the estimated value of the second iteration parameter, in the first iteration, the calculation:
- M step: An estimate of the parameter that is to be made to be maximal, to determine the number of times:
The convergence of the EM algorithm guarantees that the iterative process can guarantee that the maximum likelihood function value is finally found. It is necessary to prove that the maximum likelihood estimation is monotonically increasing during the whole iterative process. The assumptions and the results after the first and second iterations of the EM algorithm, selected, are iterated:
- E-Step:
- M Step:
Fixed, will be considered as variables: in the above, the first is greater than or equal to because: six, the use of EM algorithm parameters to solve the case
Suppose there is a batch of data that is composed of two normal distributions:
Produced, among them, and unknown,. But do not know the specific is the first generation, that is, can be used and expressed. This is a typical example of a hidden variable, the hidden variable is the and. The parameters can be estimated using the EM algorithm.
- The first is initialization and;
- Step e: The probability that the data is generated by the first distribution:
- M-Step: Calculates the maximum expected value. However, the parameters we require are mean values that can be estimated in the following ways:
Python code
[Python]View Plaincopy
- #coding: UTF-8
- "' "
- Created on June 7, 2015
- @author: Zhaozhiyong
- ‘‘‘
- From __future__ Import Division
- From NumPy Import *
- Import Math as MT
- #首先生成一些用于测试的样本
- #指定两个高斯分布的参数, the two Gaussian distributions have the same variance
- Sigma = 6
- Miu_1 =
- Miu_2 =
- #随机均匀选择两个高斯分布, for generating sample values
- N =
- X = Zeros ((1, N))
- For I in xrange (N):
- if Random.random () > 0.5:#使用的是numpy模块中的random
- x[0, I] = RANDOM.RANDN () * sigma + miu_1
- Else:
- x[0, I] = RANDOM.RANDN () * sigma + miu_2
- #上述步骤已经生成样本
- #对生成的样本, the mean value of Miu is calculated using the EM algorithm
- #取miu的初始值
- K = 2
- Miu = Random.random ((1, k))
- #miu = Mat ([40.0, 20.0])
- expectations = Zeros ((N, K))
- For step in xrange (+):#设置迭代次数
- #步骤1, calculate expectations
- For i in xrange (N):
- #计算分母
- Denominator = 0
- For J in Xrange (k):
- Denominator = denominator + mt.exp (-1/(2 * Sigma * * 2) * (x[0, I]-miu[0, J]) * * 2)
- #计算分子
- For J in Xrange (k):
- Numerator = Mt.exp (-1/(2 * Sigma * * 2) * (x[0, I]-miu[0, J]) * * 2) /c0>
- Expectations[i, j] = Numerator/denominator
- #步骤2, the most desired
- #oldMiu = Miu
- Oldmiu = Zeros ((1, k))
- For J in Xrange (k):
- oldmiu[0, j] = miu[0, J]
- Numerator = 0
- Denominator = 0
- For i in xrange (N):
- Numerator = numerator + Expectations[i, j] * x[0, I]
- Denominator = denominator + Expectations[i, j]
- miu[0, j] = Numerator/denominator
- #判断是否满足要求
- Epsilon = 0.0001
- If SUM (ABS (MIU-OLDMIU)) < epsilon:
- Break
- Print Step
- Print Miu
- Print Miu
Final result
[[40.49487592 19.96497512]]
Reference article:
1, (EM algorithm) The EM Algorithm (http://www.cnblogs.com/jerrylead/archive/2011/04/06/2006936.html)
2. Mathematical expectation (Http://wenku.baidu.com/view/915a9c1ec5da50e2524d7f08.html?re=view)
A simple and easy-to-learn machine learning algorithm--EM algorithm