A simple and easy-to-learn machine learning algorithm--EM algorithm

Source: Internet
Author: User

A simple and easy-to-learn machine learning algorithm--EM algorithm

The problem of parameter estimation in machine learning

In the previous blog post, such as the "easy-to-learn machine learning algorithm--logistic regression", the maximum likelihood function is used to estimate the parameters of the model, in a nutshell, the logistic regression problem belongs to the supervised learning problem in a series of samples, and the sample contains the training characteristics and tags. In the parametric solution of logistic regression, by constructing the probability that the samples belong to categories and categories:

This gives the logistic regression a probability function that belongs to different classes:

At this point, the parameters in the model can be estimated using the maximum likelihood estimation. However, if the label is unknown at this time, known as implicit variables, such as unsupervised learning problems, typically such as the K-means clustering algorithm, the parameters in the model can not be estimated directly by the maximum likelihood estimation.

Ii. Introduction to EM algorithm

In the problem of implicit variables, the parameters in the model can not be obtained directly by the maximum likelihood estimation, and the EM algorithm is an effective method to solve the problem of implicit variable optimization. The EM algorithm is the short name of the desired maximal (expectation maximization) algorithm, and the EM algorithm is an iterative algorithm, which is divided into two main steps in each iterative process: the desired (expectation) step and the maximization (maximization) Steps.

Preparation for derivation of EM algorithm 1, convex function

Set is a function defined on a real field, if for any real number, there is

Then the convex function. If it is not a single real number, but a vector of real numbers, if the Hesse matrix of the function is semi-positive, that is

Then the convex function. In particular, if or, it is called a strict convex function.

2. Jensen Inequalities

If the function is a convex function and is a random variable, then

In particular, if the function is a strictly convex function, then if and only if the

That is, a random variable is a constant.

(Image from reference article 1)

Note: If the function is a concave function, the above symbol is reversed.

3. Expectation of mathematical expectation 3.1 random variables

The probability distributions of discrete random variables are:

where, if absolute convergence, is called the mathematical expectation, which is recorded as:

If the probability density function of a continuous random variable is, then the mathematical expectation is:

3.2 Mathematical expectation of random variable function

is a function of a random variable, that is, if the discrete random variable, the probability distribution is:

The

If the continuous random variable, the probability density function is, then

The solution process of the EM algorithm assumes that the observed variable is represented as the latent variable, then the likelihood function for the complete data is, wherein, for the parameter to be estimated, the likelihood function for the complete data is. In order to estimate the parameters, we can estimate them using the method of maximum likelihood estimation for the given observation data. Because the variables are unknown, we can only use the likelihood function for maximum likelihood estimation, that is, the need to be maximal: The above equation cannot be directly to the maximum value, because there is an implicit variable in the function, that is, the unknown variable. If at this time, we can determine the value of the hidden variable, we can find the maximum value, you may use the constant modification of the value of the hidden variable, to obtain a new maximum value. This is the idea of the EM algorithm.    The parameters are obtained by iterative means. First, we need to assign the initial value to the parameters, iterative operation, assuming that the values of the second iteration of the parameter is, at this time the log likelihood function is, namely: in the formula, the second line to the third exercise used the Jensen inequality, because the log function is a concave function, by the Jensen inequality is obtained: and the expectation is that, Represents some kind of distribution that the implicit variable satisfies. Thus, the value of the upper formula depends on the probability of the and. In the iterative process, the two probabilities are adjusted so that the nether continues to rise, so that the maximum value can be obtained. Note that when the equation is set up, the description is now equivalent to. By the Jensen inequality, the condition of the equation is that the random variable is constant, i.e.: known: so: So far, we have obtained the form of the distribution that the implicit variable satisfies. This is the e-step in the EM algorithm. After the determination, the adjustment parameters make great, this is M step. The steps of the EM algorithm are:
    1. Initializes the parameters and begins the iteration;
    2. E-Step: assumed to be the estimated value of the second iteration parameter, in the first iteration, the calculation:
    3. M step: An estimate of the parameter that is to be made to be maximal, to determine the number of times:
The convergence of the EM algorithm guarantees that the iterative process can guarantee that the maximum likelihood function value is finally found. It is necessary to prove that the maximum likelihood estimation is monotonically increasing during the whole iterative process. The assumptions and the results after the first and second iterations of the EM algorithm, selected, are iterated:
    1. E-Step:
    2. M Step:
Fixed, will be considered as variables: in the above, the first is greater than or equal to because: six, the use of EM algorithm parameters to solve the case

Suppose there is a batch of data that is composed of two normal distributions:

Produced, among them, and unknown,. But do not know the specific is the first generation, that is, can be used and expressed. This is a typical example of a hidden variable, the hidden variable is the and. The parameters can be estimated using the EM algorithm.

    1. The first is initialization and;
    2. Step e: The probability that the data is generated by the first distribution:
    3. M-Step: Calculates the maximum expected value. However, the parameters we require are mean values that can be estimated in the following ways:

Python code

[Python]View Plaincopy
  1. #coding: UTF-8
  2. "' "
  3. Created on June 7, 2015
  4. @author: Zhaozhiyong
  5. ‘‘‘
  6. From __future__ Import Division
  7. From NumPy Import *
  8. Import Math as MT
  9. #首先生成一些用于测试的样本
  10. #指定两个高斯分布的参数, the two Gaussian distributions have the same variance
  11. Sigma = 6
  12. Miu_1 =
  13. Miu_2 =
  14. #随机均匀选择两个高斯分布, for generating sample values
  15. N =
  16. X = Zeros ((1, N))
  17. For I in xrange (N):
  18. if Random.random () > 0.5:#使用的是numpy模块中的random
  19. x[0, I] = RANDOM.RANDN () * sigma + miu_1
  20. Else:
  21. x[0, I] = RANDOM.RANDN () * sigma + miu_2
  22. #上述步骤已经生成样本
  23. #对生成的样本, the mean value of Miu is calculated using the EM algorithm
  24. #取miu的初始值
  25. K = 2
  26. Miu = Random.random ((1, k))
  27. #miu = Mat ([40.0, 20.0])
  28. expectations = Zeros ((N, K))
  29. For step in xrange (+):#设置迭代次数
  30. #步骤1, calculate expectations
  31. For i in xrange (N):
  32. #计算分母
  33. Denominator = 0
  34. For J in Xrange (k):
  35. Denominator = denominator + mt.exp (-1/(2 * Sigma * * 2) * (x[0, I]-miu[0, J]) * * 2)
  36. #计算分子
  37. For J in Xrange (k):
  38. Numerator = Mt.exp (-1/(2 * Sigma * * 2) * (x[0, I]-miu[0, J]) * * 2) /c0>
  39. Expectations[i, j] = Numerator/denominator
  40. #步骤2, the most desired
  41. #oldMiu = Miu
  42. Oldmiu = Zeros ((1, k))
  43. For J in Xrange (k):
  44. oldmiu[0, j] = miu[0, J]
  45. Numerator = 0
  46. Denominator = 0
  47. For i in xrange (N):
  48. Numerator = numerator + Expectations[i, j] * x[0, I]
  49. Denominator = denominator + Expectations[i, j]
  50. miu[0, j] = Numerator/denominator
  51. #判断是否满足要求
  52. Epsilon = 0.0001
  53. If SUM (ABS (MIU-OLDMIU)) < epsilon:
  54. Break
  55. Print Step
  56. Print Miu
  57. Print Miu


Final result

[[40.49487592 19.96497512]]

Reference article:

1, (EM algorithm) The EM Algorithm (http://www.cnblogs.com/jerrylead/archive/2011/04/06/2006936.html)

2. Mathematical expectation (Http://wenku.baidu.com/view/915a9c1ec5da50e2524d7f08.html?re=view)

A simple and easy-to-learn machine learning algorithm--EM algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.