Unsupervised Learning:k-means algorithm

Source: Internet
Author: User

K-means algorithm is one of the most popular and most used clustering algorithms at present.

K-means algorithm

If we want to divide the green points into two categories, first randomly select two cluster centroids ( the Cluster Center) and then iterate (loop) to do two things: Cluster assignment and move centroids (Figure 1)

cluster Assignment: then each sample in the training set, based on the cluster centroid that is near or blue from the red cluster centroid is recently allocated cluster. (Fig. 2)

move Centroids: then calculates the position of all the red dots as the new cluster centroid, and the position of all the blue points calculates the mean as the new cluster centroid. (Fig. 3)

Cluster Assignment: Re-distributes (depending on distance) The cluster of each sample according to the new cluster centroids (Figure 4)

Move Centroids: After reassigning the clusters, calculate the average of each cluster as the new cluster centroids. (Fig. 5)

We continue to iterate and find that the cluster centroids and assigned cluster no longer change, meaning that the K-means algorithm converges , that is, two cluster found in this data. This is the end of the work.

K-means algorithm formally

Input: K for we want to divide the dataset into K-clusters(we'll talk about how to choose Klater), now K is the number of cluster that the input is required to divide data into.

Training set ( no Y value , as unsupervised learning)

X (i) is n-dimensional, not n+1 , without adding x0=1

Cluster assignment step: for the first point in training data, calculate C (i) ( stain each sample ) as the nearest cluster centroid subscript value (1-k), Note that the K in the UK is lowercase, refers to the subscript of the centroid, Kcluster Centroids is capitalized, indicating that there are a total of k cluster. Usually we like to use the square of distance to find the minimum value.

Move centroid Step: recalculate the cluster centroid of each cluster (based on the average after staining)

What if a cluster centroid does not have a point assigned to it? Normally, we remove this cluster centroid, so we get K-1 clusters, and if it's a K-clusters, what do we do? The way is to find a cluster centroid. But it is more commonly used to remove this cluster centroid.

Application of K-means in clusters with no discernible distinction

The image on the left is the application of K-means on a dataset that is clearly divided into three clusters.

K-means can also be applied as shown in the figure on the right, and the dataset may appear to be indistinguishable from the obvious cluster . This is an example of a T-shirt size, such as you want to design three size (s,m,l) T-shirt, but do not know how big each size should be designed, then we will wear our T-shirt people's height and weight (these are the main factors affecting the size of T-shirt) to do a statistic, As shown in the diagram on the left, then apply the K-means algorithm to divide the data into three cluster, and then design the size of different sizes for each cluster individually. As an example of market segmentation , use K-means to divide my market into three parts, so that you can differentiate between three different groups of customers and better adapt to their different needs (such as s,m,l different size clothes)

Summarize

    1. Randomly selected cluster centroids (Cluster center)
    2. Cluster Assignment Step(staining) for each sample point
    3. Move centroid Step: recalculate the new cluster centroids (Cluster center) based on the results of the staining
    4. Repeat above 2, 3 steps until convergence (cluster centroids and staining results no longer change)

Unsupervised Learning:k-means algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.