#Paper reading# x-means:extending K-means with efficient estimation of the number of Clusters_paper

Source: Internet
Author: User
Title: X-means:extending K-means with efficient estimation of the number of clusters
Paper Address: http://cs.uef.fi/~zhao/Courses/Clustering2012/Xmeans.pdf

General contents of the thesis:
Aiming at some disadvantages of K-means, this paper proposes a K-means--x-means clustering algorithm, which can be faster than K-means.

1, K-means has three major disadvantages: ① calculation cost, ②k need artificial designation, ③ easy to lead to local minimization. Therefore, this paper presents a x-means algorithm to solve the problem of ①② and improve the ③ problem.

2. X-means used three improvements: ① used Kd-tree to accelerate each iteration of the original K-means, ② the user to specify the range of K, according to the BIC score Select the optimal K;③ each iteration of the cycle is only 2-means (2-means is not sensitive to local optimal solution). [1]

3, the detailed process of X-means:
The ① starts from a cluster center point;
② to all the current cluster centers, find two relatively distant points of each class as the initial center point, and then run a K-means (k=2) on the current class;
③ calculates the Score (Bic Score) before splitting and splitting the BIC, so as to decide whether to retain the state before the split or after the split;
④ back to ② for iteration, ending if two results (number of center points) are the same.

4, after the experiment, the X-means speed is the enumeration K K-means algorithm 2 to 8 times times, can effectively shorten the computation time.

5, think: Through this article can be seen, there are many times, X-means compared to the violent enumeration of K-means in the K, the result is a little unsatisfactory, but also the efficiency of the algorithm is higher. But if the direct two points to traverse K-means K, that may also be better than the results of X-means, efficiency is high. Therefore, I think when clustering, if the number of classes is relatively small, then the direct enumeration K-value run K-means; If the class number is larger, the binary K value runs K-means, or X-means can try.

6, Expansion: k-means++ algorithm: Because the results of K-means and the initial clustering of the central position has a great relationship, so the k-means++ algorithm is: the initial cluster center between the distance to as far as possible. [2]
Detailed steps are as follows:
① randomly selects a point from the set of input data points as the first cluster center;
② for each point x in the dataset, compute its distance d (x) from the nearest cluster center (meaning the selected cluster center);
③ Select a new data point as the new cluster Center, the principle is: D (x) larger point, the probability of being selected as the center of the cluster is large, the roulette algorithm can be used to select;
④ repeated ② and ③ until K cluster centers were selected;
⑤ uses the K Initial cluster center to run the standard K-means algorithm.

Resources:
1, http://www.cnblogs.com/porco/p/xmeans_intro.html
2, http://www.cnblogs.com/shelocks/archive/2012/12/20/2826787.html

All of the above are personal insights, because my level is limited, if found to have errors and omissions, please point out, thank you.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.