Supervised learning _ Nearest neighbor algorithm (KNN, K-Nearest neighbor algorithm) __ algorithm

Source: Internet
Author: User

In the field of pattern recognition, the nearest neighbor Method (KNN algorithm and K-nearest neighbor algorithm ) is the method to classify the closest training samples in the feature space.

The nearest neighbor method uses the vector space model to classify, the concept is the same category of cases, the similarity between each other is high, and can be calculated with a known category of cases of similarity, to assess the possible classification of unknown categories of cases.

Excerpt from: Wikipedia


For example, handwritten digit recognition, assuming that 0-9 of the numbers can be used with eigenvectors (A,b,c,...) , so now there is a number x, its eigenvectors can be expressed as (x, Y, z,...), then you can use the distance between eigenvectors (Euclidean distance) to represent their similarity, namely d=sqrt ((x-a) ^2+ (y-b) ^2+ (z-c) ^2+ ...), The smaller the D, the closer The X is to the number, or the greater the likelihood that X is the number! Here (A,b,c,...) Can be thought of as a training set (there are many training sets, such as 10,000, 0-9 each with 1000), (x, y, z,...) Can be seen as a test set, then take (x, Y, z,...) respectively and 10,000 (A,b,c,...) Calculate similarity (Euclidean distance) and record with each (A,b,c,...) Similarity (Euclidean distance) to find the smallest D (maximum similarity), which means (x, y, z,...) That's the number.

What's going on with K?

We calculated the 10,000 similarity (Euclidean distance), we find the K similarity (Euclidean distance) the highest d, and then statistics K D corresponding to the number, which number occurs most times, say (x, y, z,...) Who, for example, make k=10, statistics 5 D corresponding to the number of 1, 3 corresponding to the number 2, 2 corresponds to the number 8, then we say (x, y, z,...) is 1, similar to vote first village chief, who the number of votes, who is the village chief. What if the k=1 were to be made. That is the example above, is to find only one of the highest similarity can be.


Advantages: High accuracy, insensitive to outliers, no data input assumptions

Cons: High computational complexity, high spatial complexity (really super-high, every time a sample needs to traverse the entire training sample (10,000))

Applicable data range: Numeric and nominal (that is, discrete data, the result of the variable is only in the limited target set value)


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.