The full name of GMM is Gaussian mixture model (Gaussian mixture). Similar to the K-means algorithm, GMM is a common clustering algorithm, which differs from K-means mainly because GMM is a "soft clustering" algorithm, through which we can get the probability that each sample belongs to each center point. Because of its nature, GMM has been widely used in image segmentation and speech processing.
The K-center point can be obtained by executing K-means for n sample data, and the K Gaussian distribution will be obtained after the GMM is executed. Use Formula 1 to represent a Gaussian distribution in which Theta represents a positional parameter relative to the Φ (x∣θ) (θ can represent expectations or standard deviations).
Φ (x∣θ) =12π√σe− (x−μ) 22σ2 Formula 1
As GMM obtains the K Gaussian distribution, the result can be expressed as a Formula 2.
P (x∣θ) =∑kk=1wkϕ (x∣θk) Formula 2
WK for Φ (x∣θk) is the probability of being selected, so there is ∑kk=1wk=1; Wk≥0 the next task is to estimate the optimal set of parameters μ, θ, and w, which is solved using the maximum likelihood estimator. Before solving, we need to make a stronger assumption that all sample data are independent of each other. Then we can get the logarithmic likelihood estimator as shown in Formula 3.
L (θ) =∑ni=1logp (xi∣θ) =∑ni=1log∑kk=1wkϕ (xi∣θk) Formula 3
The optimal value to be estimated is θ^=argmaxθl (θ) for the equation shown in Formula 3, it is difficult to obtain the maximum value by the direct derivation order minus zero. Therefore, using the EM algorithm to solve this problem, the principle of EM algorithm can refer to Appendix A.
Here the Jensen inequality is used to find the lower bound function, and the description of the Jensen inequality can be referred to in Appendix B. Since function f (x) =log (x) is a concave function, it is here to reverse the symbol of the original Jensen inequality. The Γik represents the probability that the sample Xi belongs to the K Center, as shown in Formula 4. So the derivation shown in Formula 5 can be obtained.
Γik=wkϕ (xi∣θk) ∑kk=1wkϕ (xi∣θk) Formula 4 L (θ) =∑ni=1log∑kk=1wkϕ (xi∣θk) =∑ni=1log∑kk=1γikwkϕ (xi∣θk) Γik =∑Ni=1logE[wkϕ (xi ∣θk) Γik]≥∑ni=1e[logwkϕ (xi∣θk) Γik] =∑ni=1∑kk=1γiklogwkϕ (xi∣θk) Γik Formula 5
To facilitate subsequent derivation, use Formula 6 to represent the portion to the right of the above inequality.
H (w,μ,σ) =∑ni=1∑kk=1γik[(xi−μk) 2σ3k−1σk] Formula 6
To μk the H respectively, σk to find the partial guide
∂h (w,μ,σ) ∂σk=∑ni=1γik[(xi−μk) 2σ3k−1σk]∂h w,μ,σ (∂μk=∑ni=1γikxi−μkσ2k)
The countdown to 0 gives the maximum value in the current iteration, at which point
Σ2k=∑ni=1γik (xi−μk) 2∑ni=1γikμk=∑ni=1γikxi∑ni=1γik
For WK, the Lagrange multiplier method can be used to calculate the maximum value, where the calculation process is not listed directly to give the result
Wk=1n∑ni=1γik
The new μk, σk, and WK are brought into the H (w,μ,σ) to obtain a new value, and whether or not the iteration is stopped by comparing whether or not it is less than the threshold value. After stopping the iteration, we get the K Gaussian distribution of the final satisfying condition and its parameters. Appendix A EM algorithm
For a strict concave function F (θ) that requires its maximum value, you can first define a lower-bound function gθt (θ) ≤f (θ) at θt, when and only when θ=θt gθt (θ) =f (θ). So Θt+1=argmaxθgθt (theta), then there must be a lower form.
F (θt+1) ≥gθt (θt+1) ≥gθt (θt) =f (θt)
Then we define a lower-bound function gθt+1 (θ) at θt+1, and then iterate over and over again, Θt will eventually approach θ^. Appendix B Jensen Inequalities
If f (x) is a convex function, the following inequalities are established
E[f (x)]≥f[e (x)]
from:http://www.duzhongxiang.com/gmm/