K-Nearest Neighbor Algorithm Summary

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Basic Introduction

K-Nearest Neighbor (KNN) classification algorithm is a theoretically mature method and one of the simplest machine learning algorithms. The idea of this method is: if most of the k most similar samples in the feature space (that is, the most adjacent samples in the feature space) belong to a certain category, the sample also belongs to this category. In KNN algorithm, the selected neighbors are objects that have been correctly classified. This method only determines the category of the samples to be classified based on the class of one or more adjacent samples.

The KNN method also relies on the Limit Theorem in principle, but in classification decision-making, it is only related to a very small number of adjacent samples. Since the KNN method mainly relies on a limited number of adjacent samples, rather than the method used to determine the category of the class, for a class set with many overlapping or overlapping classes, KNN is more suitable than other methods.

KNN can be used for classification and regression. You can obtain the attributes of a sample by finding k nearest neighbors and assigning the average values of these neighbor attributes to the sample. A more useful method is to give different weights to the impact of different distance neighbors on the sample, for example, the weights are proportional to the distance.

The main disadvantage of this algorithm in classification is that when samples are unbalanced, for example, the sample size of a class is large, while that of other classes is small, it is possible that when a new sample is input, the samples with a large capacity class in the K neighbors of the sample account for the majority. This algorithm only calculates "nearest" neighbor samples. A certain type of samples has a large number, so either this type of samples is not close to the target sample, or this type of samples is very close to the target sample. In any case, the quantity does not affect the running result. You can use the method of weight (a large neighbor weight with a small distance from the sample) to improve. Another disadvantage of this method is that the calculation workload is large, because the distance from the text to all known samples must be calculated for each text to be classified before K Nearest Neighbor points can be obtained. Currently, the common solution is to edit known sample points in advance and remove samples that do not have much effect on classification. This algorithm is more suitable for automatic classification of class domains with a large sample size, and the class domains with a small sample size use this algorithm to easily produce false scores.

2. Algorithm Description

The concept of the k-nn algorithm is as follows: first, calculate the distance between the new sample and the training sample, and find the nearest K neighbors. Then, the class of the new sample is determined based on the type of the neighbor. If both belong to the same class, the new sample also belongs to this class. Otherwise, the system scores each selected class, determine the category of the new sample according to certain rules.

Take the K nearest neighbor of the unknown sample X. If we look at the class of K nearest neighbor, we will divide X into the class. That is, in K samples of X, find K nearest neighbors of X. The K-Nearest Neighbor Algorithm grows from the test sample X and continues to expand the region until it contains K training samples, in addition, the Class X of the test sample belongs to the class with the highest occurrence frequency among the nearest K training samples. For example, the Green Circle in the figure is determined to be assigned to which class, is it a red triangle or a blue square? If K = 3, because the proportion of the red triangle is 2/3, the green circle will be assigned to the class of the Red Triangle. If K = 5, because the proportion of the blue square is 3/5, therefore, the Green Circle is given a blue square

Algorithm pseudocode:

K Nearest Neighbor Search Algorithm: kNN (A [n], k)

Input: A [n] is the coordinate of N training samples in space, and k is the number of nearest neighbors.

Output: Category of x

Obtain A [1] ~ A [k], as the first neighbor of x, calculates the Euclidean distance d (x, A [I]), I =,..., between x and test sample x ,....., k; sort by d (x, A [I]) in ascending order to calculate the distance D between the farthest sample and x <----- max {d (x, a [j]) | j = 1, 2 ,....., k };

For (I = k + 1; I <= n; I ++)

Calculate the distance d (x, a [I]) between A [I] and x.

If (d (x, A [I]) <D

Then replaces the farthest sample with A [I]

Sort by d (x, A [I]) in ascending order to calculate the distance D between the farthest sample and x <--- max {d (x, A [j]) | j = 1 ,..., i}; Calculate the first k samples A [I]), I = 1, 2 ,..., the probability that k belongs to a category. The class with the highest probability is the class of sample x.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

K-Nearest Neighbor Algorithm Summary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

K-Nearest Neighbor Algorithm Summary

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support