1. Maximum likelihood and maximum probability
Because it is not a class background, when we first came into contact with the maximum likelihood, it was always strange why it was called the maximum likelihood instead of the maximum probability?
It was later known that the maximum likelihood was used to estimate unknown parameters. The maximum probability expression is more suitable for solving the variable with the maximum probability when known parameters are known. For example:
Max L (θ) = θ 1x1 + θ 2x2 + θ 3x3
Max p (x) = θ 1x1 + θ 2x2 + θ 3x3
Max L (θ) is the method for estimating the θ parameter when there are multiple groups of observed samples X, while Max p (x) is the opposite. It is known that θ, what kind of X will lead to maximum P.
2. Maximum likelihood and unsupervised
After understanding the first point, let's take a look at the relationship between the maximum likelihood and unsupervised.
Unsupervised refers to the unsupervised learning method in machine learning. For example, we know that a group of variables X follows the Gaussian distribution (normal distribution ), so how can we estimate the parameters μ and σ in Gaussian distribution?
For example, the "height" of a school student is subject to the Gaussian distribution. The shorter the student is, the less the student is. Only the middle is the largest, and the examination scores are usually consistent with the Gaussian distribution, the two sides are too small, but we do not know how much μ and σ are, so we cannot construct the overall distribution function, if a student asks the teacher in charge of the class, where is my score in the province?
The class teacher only knows the samples of the school and does not know how everyone in the province has taken the test. However, he knows that the scores of both the school and the province are consistent with the Gaussian distribution, how can we estimate the Gaussian distribution?
As the class teacher was a math teacher, he quickly came up with a solution. He used the existing sample X to estimate the Gaussian distribution parameters μ and σ!
Max L (θ) is used for maximum likelihood estimation ).
In fact, this is a method for finding unsupervised machine learning methods. Suppose our machine learning problem is to learn the parameters of the Gaussian distribution function of the test scores, and we only have the variables that are the observed samples, there is no observed value, that is, the result of the substitution of the variable into the Gaussian distribution. In this example, it is a probability value, which can be equivalent to the distribution position of the Student Score.
To sum up the above, when we only observe some samples, but do not mark the results, and know the hypothesis distribution function, we can use the maximum likelihood estimation method to estimate the optimal parameters of the hypothesis distribution function under this set of observations.
Related information:
Maximum Likelihood Estimation Normal Distribution: http://www.dwz.cn/v25v2
Maximum Likelihood Estimation: http://blog.sciencenet.cn/home.php? MoD = Space & uid = 491809 & Do = Blog & id = 400893
Reprinted, please indicate the reference from:
Http://www.cnblogs.com/breakthings/p/4058543.html
Maximum likelihood and unsupervised