Unsupervised Learning:k-means algorithm

Last Update:2016-02-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

K-means algorithm is one of the most popular and most used clustering algorithms at present.

K-means algorithm

If we want to divide the green points into two categories, first randomly select two cluster centroids ( the Cluster Center) and then iterate (loop) to do two things: Cluster assignment and move centroids (Figure 1)

cluster Assignment: then each sample in the training set, based on the cluster centroid that is near or blue from the red cluster centroid is recently allocated cluster. (Fig. 2)

move Centroids: then calculates the position of all the red dots as the new cluster centroid, and the position of all the blue points calculates the mean as the new cluster centroid. (Fig. 3)

Cluster Assignment: Re-distributes (depending on distance) The cluster of each sample according to the new cluster centroids (Figure 4)

Move Centroids: After reassigning the clusters, calculate the average of each cluster as the new cluster centroids. (Fig. 5)

We continue to iterate and find that the cluster centroids and assigned cluster no longer change, meaning that the K-means algorithm converges , that is, two cluster found in this data. This is the end of the work.

K-means algorithm formally

Input: K for we want to divide the dataset into K-clusters(we'll talk about how to choose Klater), now K is the number of cluster that the input is required to divide data into.

Training set ( no Y value , as unsupervised learning)

X (i) is n-dimensional, not n+1 , without adding x0=1

Cluster assignment step: for the first point in training data, calculate C (i) ( stain each sample ) as the nearest cluster centroid subscript value (1-k), Note that the K in the UK is lowercase, refers to the subscript of the centroid, Kcluster Centroids is capitalized, indicating that there are a total of k cluster. Usually we like to use the square of distance to find the minimum value.

Move centroid Step: recalculate the cluster centroid of each cluster (based on the average after staining)

What if a cluster centroid does not have a point assigned to it? Normally, we remove this cluster centroid, so we get K-1 clusters, and if it's a K-clusters, what do we do? The way is to find a cluster centroid. But it is more commonly used to remove this cluster centroid.

Application of K-means in clusters with no discernible distinction

The image on the left is the application of K-means on a dataset that is clearly divided into three clusters.

K-means can also be applied as shown in the figure on the right, and the dataset may appear to be indistinguishable from the obvious cluster . This is an example of a T-shirt size, such as you want to design three size (s,m,l) T-shirt, but do not know how big each size should be designed, then we will wear our T-shirt people's height and weight (these are the main factors affecting the size of T-shirt) to do a statistic, As shown in the diagram on the left, then apply the K-means algorithm to divide the data into three cluster, and then design the size of different sizes for each cluster individually. As an example of market segmentation , use K-means to divide my market into three parts, so that you can differentiate between three different groups of customers and better adapt to their different needs (such as s,m,l different size clothes)

Summarize

Randomly selected cluster centroids (Cluster center)
Cluster Assignment Step(staining) for each sample point
Move centroid Step: recalculate the new cluster centroids (Cluster center) based on the results of the staining
Repeat above 2, 3 steps until convergence (cluster centroids and staining results no longer change)

Unsupervised Learning:k-means algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Unsupervised Learning:k-means algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Unsupervised Learning:k-means algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support