K-MEANS-algorithm-Overview

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. algorithm flow

Input: the number of clusters is k, and the database that contains n data objects. Output: k clusters that meet the minimum variance standard.
(1) Select k objects from n data objects as the initial cluster center.
(2) calculate the distance between each object and the cluster center, and re-divide the corresponding objects according to the minimum distance.
(3) recalculate the mean value of each cluster as the new cluster center.
(4) cycle (2) to (3) until each clustering does not change

2. Algorithm Analysis

The K-Means optimization goal can be expressed:

X_n indicates the data object, μ _ k indicates the center point, r_nk is 1 when data points n are allocated to Category k, and 0 when data points n are not allocated to Category k.

The entire algorithm uses Iterative Computing to find the appropriate r_nk and μ _ k to minimize J.
Step 2 of the algorithm flow, fix μ _ k, update r_nk, and place each data object in the category of its nearest cluster center, naturally, this step can minimize the value of J when the μ _ k is fixed.
Step 3 of the algorithm flow: Fix r_nk and update μ _ k. In this case, J pairs of μ _ k (actually μ _ 0, μ _ 1 ,... evaluate) evaluate and make the result equal to zero:

That is, when the new center point is the center value of each category, the standard distance within each category decreases most. J is the sum of the distances between all classes and the interior. Therefore, when r_nk is fixed, the value of J is minimized.
In the two steps, the J value is decreasing, and the J value decreases to a minimum value as the number of iterations increases.

3. End Condition

The K-means iteration conditions can be as follows:
· The internal elements of each cluster do not change, which is the ideal situation.
· For the first and second iterations, the Value Difference of J is smaller than a threshold value.
· Iterations exceed a certain number of times.

4. Disadvantages

· It is difficult to estimate the K value setting. If the data is actually 10 categories and K is set to 20, the result may be poor. If K is set to 10, the result is likely to be good.
· After K is determined, the initial center is also a problem. Once K centers are selected, the clustering results are determined. The selected results are good and the clustering results are good.
I personally think the main disadvantage is that there are also some improvement methods, which are not involved here. For details, refer to Baidu encyclopedia _ k-means.

5. Key Points

There are two main points in this article:
The three ending conditions of K-means (not changed, J value slightly changed, iterations) and two disadvantages (K value, K centers ).

6. Reference

K-MEANS Co., http://baike.baidu.com/view/31854.htm.
Baidu baibaibai_k-Means http://baike.baidu.com/view/3066906.htm
Talking about Clustering (1): k-means http://blog.pluskid.org /? P = 17 # comments

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

K-MEANS-algorithm-Overview

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

K-MEANS-algorithm-Overview

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support