A brief introduction to K-means and KNN algorithms

Last Update:2016-07-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

K-means algorithm

The K-means algorithm accepts the input k; then divides the N data objects into K clusters to satisfy the obtained clusters: the similarity of objects in the same cluster is higher, while the similarity of objects in different clusters is small. Clustering similarity is obtained by using the mean value of the objects in each cluster to obtain a "center object" (gravitational center) to calculate.

The working process of the K-means algorithm is described as follows: first select K objects from N data Objects as the initial cluster centers, and for the rest of the objects, they are assigned to their closest similarity (distance) according to their similarity to these clustering centers (the cluster centers represent). And then computes the cluster center of each new cluster (the mean value of all the objects in the cluster); Repeat this process until the standard measure function begins to converge. Mean variance is generally used as the standard measure function. K clusters have the following characteristics: Each cluster itself is as compact as possible, and each cluster is as separate as possible.

K Nearest neighbor (k-nearest NEIGHBOR,KNN) Classification algorithm

KNN classification algorithm is a theoretically mature method and one of the simplest machine learning algorithms. The idea of this approach is that if a sample is in the K most similar in the feature space (that is, the nearest neighbor in the feature space) Most of the samples belong to a category, then the sample belongs to that category. In the KNN algorithm, the selected neighbors are the objects that have been correctly categorized. This method determines the category to which the sample is to be divided based on the category of the nearest one or several samples in the categorical decision-making. Although the KNN method relies on the limit theorem in principle, it is only related to a very small number of adjacent samples in the class decision. The KNN method is more suitable than other methods because the KNN method mainly relies on the surrounding finite sample, rather than the Discriminant class domain method to determine the category of the class.

The KNN algorithm can be used not only for classification, but also for regression. by locating the K nearest neighbor of a sample and assigning the average of the properties of those neighbors to the sample, you can get the properties of the sample. A more useful approach is to give different weights (weight) to the effect that neighbors of different distances have on the sample, as the weights are proportional to the distance.

The main disadvantage of this algorithm is that when the sample is unbalanced, the sample size of a class is large, and the other class sample capacity is very small, it is possible that when a new sample is entered, the sample of the large-capacity class in the K-neighbor of the specimen is the majority. Therefore, the method of weight can be used (and the value of the neighbor with small distance of the sample is large) to improve. Another disadvantage of this method is that it is computationally large because each text to be classified is calculated from its distance to all known samples in order to obtain its K nearest neighbors. At present, the common solution is to pre-edit the known sample points in advance to remove the small sample of the role of classification. This algorithm is suitable for the automatic classification of the class domain with large sample capacity, while those with smaller sample capacity are more prone to error points.

A brief introduction to K-means and KNN algorithms

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

A brief introduction to K-means and KNN algorithms

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

A brief introduction to K-means and KNN algorithms

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support