K-means Clustering

Source: Internet
Author: User

Clustering algorithm, unsupervised learning category, there is no explicit classification information.

A given n training sample {X1,X2,X3,...,XN}

The process description for the Kmeans algorithm is as follows:

1. Create a K-point as the starting centroid point, c1,c2,...,c
K

2. Repeat the following process until the convergence
Traverse all Samples X
I

Traverse all centroid C
J

Record the distance between the centroid and the sample
Assigns a sample to its nearest centroid
For each class, the mean value of all samples is computed and used as the new centroid

Shows the effect of K-means clustering on n sample points, where K takes 2.

  A few things to note:

  K Points how to take

  1. Select the K points as far away from

First randomly select a point P1 as the centroid of the first cluster, and then select the point P1 the furthest point P2 as the centroid of the second cluster,

Then select the point with the maximum distance from the front P1 and P2 as the centroid of the third cluster. Max (min (d (P1), D (p2)))

And so on, choose K points.

2. Use hierarchical clustering or canopy algorithm to first cluster, using the center point of these clusters as the centroid of Kmeans initial cluster

Requirements: Samples are relatively small, such as hundreds of to thousands of (hierarchical clustering overhead); k smaller than sample size

  How to determine the K value

  PS: Each class is called a cluster, the diameter of the cluster: the maximum distance between any two points in a cluster, the radius of a cluster: the maximum distance from the intra-cluster point to the cluster centroid

Given an appropriate cluster indicator, it can be a weighted average of the cluster average radius, the cluster average diameter , or the average centroid distance (the weight can be the number of points within the cluster)

K values are respectively taken in 1,2,4,8,16 ....

Basically, when the number of clusters is lower than the real number, cluster indicator will decrease with the number of clusters, and when the number of clusters is higher than the real number, cluster indicator will tend to be stable.

Find the turning point shown in the figure, first determine the approximate range of K, and then find the value of K by binary search

  

Algorithm stop condition

  1. Specify an iteration count to stop

2. Target function convergence

  

  

Reference: http://www.cnblogs.com/jerrylead/archive/2011/04/06/2006910.html

K-means Clustering

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.