K-means (K-mean) algorithm __ algorithm

Source: Internet
Author: User

The basic idea of the K-means algorithm is to initially randomly set the center of K clusters, and classify the sample points to each cluster according to the nearest neighbor principle. Then the centroid of each cluster is recalculated by averaging method, and the new cluster heart is determined. Iterate until the cluster heart moves less than a given value. K is the number of clusters we need to give beforehand (k is less than the number of samples N).


The K-means clustering algorithm is divided into three steps: (1) randomly selects K sample points as the center of the cluster in a sample.
(2) Calculate the distance from each point to the center of the cluster, clustering each point to the cluster of the cluster center closest to that point.
(3) Recalculate the average of the coordinates in each cluster, and use the mean as a new cluster center
Repeat (2), (3) until the center point of the cluster no longer changes or changes very little.


The following figure shows the effect of K-means clustering on n sample points, here k=2:



Advantages of the K-means algorithm:
is a classical algorithm to solve the clustering problem, it is simple and fast to deal with large data sets, the algorithm maintains the scalability and high efficiency when the result cluster is dense, it has better effect.

Disadvantages of the K-means algorithm:
The k value needs to be given beforehand, and the selection of this k value is very difficult to estimate. Many times, there is no prior knowledge of how many categories a given dataset should fit into. The K-means algorithm is sensitive to the initial value and may get different results for different initial values. (The k-means++ algorithm can be used to solve this problem, it can effectively select the initial point). Not suitable for finding clusters of non-convex shapes or large-sized clusters. Sensitive to manic and outlier data.


Therefore, the K-means algorithm is often used for a phase of other clustering algorithms, such as spectral clustering.




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.