In the field of pattern recognition, the nearest neighbor Method (KNN algorithm and K-nearest neighbor algorithm ) is the method to classify the closest training samples in the feature space.
The nearest neighbor method uses the vector space model to classify, the concept is the same category of cases, the similarity between each other is high, and can be calculated with a known category of cases of similarity, to assess the possible classification of unknown categories of cases.
Excerpt from: Wikipedia
For example, handwritten digit recognition, assuming that 0-9 of the numbers can be used with eigenvectors (A,b,c,...) , so now there is a number x, its eigenvectors can be expressed as (x, Y, z,...), then you can use the distance between eigenvectors (Euclidean distance) to represent their similarity, namely d=sqrt ((x-a) ^2+ (y-b) ^2+ (z-c) ^2+ ...), The smaller the D, the closer The X is to the number, or the greater the likelihood that X is the number! Here (A,b,c,...) Can be thought of as a training set (there are many training sets, such as 10,000, 0-9 each with 1000), (x, y, z,...) Can be seen as a test set, then take (x, Y, z,...) respectively and 10,000 (A,b,c,...) Calculate similarity (Euclidean distance) and record with each (A,b,c,...) Similarity (Euclidean distance) to find the smallest D (maximum similarity), which means (x, y, z,...) That's the number.
What's going on with K?
We calculated the 10,000 similarity (Euclidean distance), we find the K similarity (Euclidean distance) the highest d, and then statistics K D corresponding to the number, which number occurs most times, say (x, y, z,...) Who, for example, make k=10, statistics 5 D corresponding to the number of 1, 3 corresponding to the number 2, 2 corresponds to the number 8, then we say (x, y, z,...) is 1, similar to vote first village chief, who the number of votes, who is the village chief. What if the k=1 were to be made. That is the example above, is to find only one of the highest similarity can be.
Advantages: High accuracy, insensitive to outliers, no data input assumptions
Cons: High computational complexity, high spatial complexity (really super-high, every time a sample needs to traverse the entire training sample (10,000))
Applicable data range: Numeric and nominal (that is, discrete data, the result of the variable is only in the limited target set value)