K-Nearest neighbor (KNN) algorithm

Last Update:2018-01-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The K-Nearest neighbor algorithm (K-NN) neighbor algorithm, or the nearest nearest neighbor (Knn,k-nearestneighbor) classification algorithm, is one of the simplest methods in data mining classification technology.　　The so-called K nearest neighbor is the meaning of K's closest neighbour, saying that each sample can be represented by its nearest K-neighbor. The core idea of the KNN algorithm is that if the majority of the k nearest samples in a feature space belong to a category, the sample also falls into this category and has the characteristics of the sample on this category. This method determines the category to which the sample is to be divided, depending on the category of one or more adjacent samples in determining the classification decision. The KNN method is only associated with a very small number of adjacent samples when deciding on a class.　　The KNN method is more suitable than other methods because the KNN method mainly relies on the surrounding finite sample, rather than the Discriminant class domain method to determine the category of the class. KNN is classified by measuring the distance between different eigenvalues. The idea is that if a sample is the most similar in the K in the feature space (that is, the nearest neighbor in the feature space), the sample belongs to that category. K is usually an integer that is not greater than 20. In the KNN algorithm, the selected neighbors are the objects that have been correctly categorized. This method determines the category to which the sample is to be divided based on the category of the nearest one or several samples in the categorical decision-making.

The following is a simple example of how a green circle is to be determined by which class, is it a red triangle or a blue quad? If k=3, because the red triangle is the proportion of 2/3, the green circle will be given the red triangle that class, if k=5, because the blue four-square scale is 3/5, so the green circle is given the blue four-square class.

It also shows that the results of KNN algorithm depend largely on the choice of K.

In KNN, by calculating the distance between objects as a non-similarity between the objects, to avoid the matching between objects, where the distance between the general use of Euclidean distance or Manhattan distance:

At the same time, KNN makes decisions based on the dominant category in K-objects, rather than a single object-category decision. These two points are the advantages of the KNN algorithm.

The following is a summary of the KNN algorithm: in the training of the data and the label is known, the input test data, the characteristics of the test data and training set the corresponding characteristics of the comparison, to find the most similar training focus on the first K data, The category that corresponds to the test data is the one that has the most occurrences in K data, and its algorithm is described as:

1) Calculate the distance between the test data and each training data;

2) Sort by the increment relation of distance;

3) Select K points with a minimum distance;

4) Determine the occurrence frequency of the category of the first k points;

5) return the category with the highest frequency in the first K points as a predictive classification of the test data

Disadvantages:

1) No model is established, the test sample and the prediction sample are compared with all training samples, and when the training set test set data is large, the computational amount will be very large;

2) K-nearest neighbor is not able to give an understandable model, not to generate a model.

Note: Calculates the similarity (or distance) between the sample and the sample. Since the category of the sample is determined by the most frequently occurring category in the K nearest neighbor, pay attention to the problem of K's value. Note that the K=1 case is not sufficient to determine the category of the test sample, because the data has noise and outliers (outliers). More neighborhood collections are needed to determine the exact category.

Reference blog:

Https://www.cnblogs.com/ybjourney/p/4702562.html

"Web Data Mining" (second edition) Bing Liu Yong Yu and other translation-------------------------------(chapter III supervised learning 3.9 K-Nearest-neighbor learning)

K-Nearest neighbor (KNN) algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

K-Nearest neighbor (KNN) algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

K-Nearest neighbor (KNN) algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support