A brief introduction to K-means and KNN algorithms

Source: Internet
Author: User

K-means algorithm

The K-means algorithm accepts the input k; then divides the N data objects into K clusters to satisfy the obtained clusters: the similarity of objects in the same cluster is higher, while the similarity of objects in different clusters is small. Clustering similarity is obtained by using the mean value of the objects in each cluster to obtain a "center object" (gravitational center) to calculate.

The working process of the K-means algorithm is described as follows: first select K objects from N data Objects as the initial cluster centers, and for the rest of the objects, they are assigned to their closest similarity (distance) according to their similarity to these clustering centers (the cluster centers represent). And then computes the cluster center of each new cluster (the mean value of all the objects in the cluster); Repeat this process until the standard measure function begins to converge. Mean variance is generally used as the standard measure function. K clusters have the following characteristics: Each cluster itself is as compact as possible, and each cluster is as separate as possible.

K Nearest neighbor (k-nearest NEIGHBOR,KNN) Classification algorithm

KNN classification algorithm is a theoretically mature method and one of the simplest machine learning algorithms. The idea of this approach is that if a sample is in the K most similar in the feature space (that is, the nearest neighbor in the feature space) Most of the samples belong to a category, then the sample belongs to that category. In the KNN algorithm, the selected neighbors are the objects that have been correctly categorized. This method determines the category to which the sample is to be divided based on the category of the nearest one or several samples in the categorical decision-making. Although the KNN method relies on the limit theorem in principle, it is only related to a very small number of adjacent samples in the class decision. The KNN method is more suitable than other methods because the KNN method mainly relies on the surrounding finite sample, rather than the Discriminant class domain method to determine the category of the class.

The KNN algorithm can be used not only for classification, but also for regression. by locating the K nearest neighbor of a sample and assigning the average of the properties of those neighbors to the sample, you can get the properties of the sample. A more useful approach is to give different weights (weight) to the effect that neighbors of different distances have on the sample, as the weights are proportional to the distance.

The main disadvantage of this algorithm is that when the sample is unbalanced, the sample size of a class is large, and the other class sample capacity is very small, it is possible that when a new sample is entered, the sample of the large-capacity class in the K-neighbor of the specimen is the majority. Therefore, the method of weight can be used (and the value of the neighbor with small distance of the sample is large) to improve. Another disadvantage of this method is that it is computationally large because each text to be classified is calculated from its distance to all known samples in order to obtain its K nearest neighbors. At present, the common solution is to pre-edit the known sample points in advance to remove the small sample of the role of classification. This algorithm is suitable for the automatic classification of the class domain with large sample capacity, while those with smaller sample capacity are more prone to error points.

A brief introduction to K-means and KNN algorithms

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.