K-means clustering algorithm (non-mapreduce implementation)

Last Update:2018-12-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Cite: http://www.cnblogs.com/jerrylead/archive/2011/04/06/2006910.html

1. Concept

K-meansAlgorithmAccept input KThen, n data objects are divided into k clusters to meet the cluster requirements:The object similarity in the same cluster is high, while the object similarity in different clusters is small.. Clustering similarity is calculated by using the mean value of objects in each cluster to obtain a "central object" (gravity center.

2. General Introduction

Clustering belongs to unsupervised learning. In the past, regression, Naive Bayes, SVM, and so on wereWith category tagsY, that is, the classification of the sample has been given in the sample. WhileNo Y is given in the cluster sample, and only feature X is available.For example, assume that stars in the universe can be represented as point sets in 3D space. The purpose of clustering is to locate the potential Category Y of each sample X and put the sample X of the same category Y together. For example, if the stars above are clustered, the result is a cluster. The points in the cluster are close to each other, and the distance between the stars is far.

In the clustering problem, the training sample for us is, each, without y.

The K-means algorithm clusters samples into k clusters. The specific algorithm is described as follows:

1. K cluster centroids are randomly selected. .

2. repeat the following process until convergence {

calculate the class that each sample I belongs to

for each Class J, recalculate the center of the class

}

K is the number of clusters we have given in advance. It represents the class closest to K in the sample I and its value is one of 1 to K. Centroid represents our speculation about the center of the sample that belongs to the same class. The cluster model is used to explain that all the stars are clustered into k clusters, first, K points (or k stars) in the universe are randomly selected as the centroid of K star clusters. Then, the first step is to calculate the distance from each star to each of k stars, then, select the nearest cluster. In this way, each star in the first step has its own cluster. In the second step, for each cluster, recalculate its center (calculate the mean of all the stars in it ). Repeat steps 1 and 2The center of gravity remains unchanged or changes little..

The K-means clustering of N sample points is displayed. Here, K is 2.

The first problem facing K-means is how to ensure convergence. The previous algorithm emphasizes that the end condition is convergence, which proves that K-means can completely guarantee convergence. The following describes the convergence in a qualitative way. We define the distortion function as follows:

The J function indicates the sum of squares of the distance between each sample point and its center of mass.. K-means is to adjust J to the minimum. Assuming that J does not reach the minimum valueFirst, you can fix the center of each class and adjust the category of each sample to reduce the number of J functions.Similarly, fixed, adjusting the center of each class can also reduce J. These two processes are the process of monotonic decreasing J in the inner loop. When J decreases to the hour, it converges with C at the same time. (Theoretically, multiple groups of different and C values can be used to obtain the minimum value for J, but this phenomenon is rare in fact ).

Since the distortion function J is non-convex function, it means that the minimum value obtained cannot be the global minimum value. That is to say, K-means is a cold choice for the initial position of the center, however, in general, the local optimum achieved by K-means has met the requirement. However, if you are afraid of local optimizationYou can select different initial values to run K-means multiple times, and then take the corresponding and C output values of the smallest J.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

K-means clustering algorithm (non-mapreduce implementation)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

K-means clustering algorithm (non-mapreduce implementation)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support