First, k nearest neighbor algorithm
K-Nearest Neighbor method (K-nearest neighbor,k-nn) is a basic classification and regression method, the input instance of the eigenvector, the output instance of the category, where the category is preferable to multi-class
Second, K nearest neighbor model
2.1 Distance Measurement
Distance definition:
(1) When P=1, called Manhattan Distance
(2) When p=2, called European distance
(3) When p is infinitely large, it is the maximum value of each coordinate distance max|xi-xj|
Note: The selection of P-values affects the classification results, for example, three points in a two-dimensional space (x1=), x2= (5,1), x3= (+)
Since X1 and X2 differ only on the second dimension, the LP is always equal to 4 regardless of the P value, while L1 (X1,X3) =3+3=6,l2 (x1,x3) = (9+9) 1/2=4.24,l3 (x1,x3) = (27+37) 1/3=3.78,l4=3.57 ...
When p=1 or 2 o'clock, X2 and X1 are neighbors.
2.2 K-Value selection
in the application, the K value generally takes a smaller value, usually using the cross-validation method to select the best K value
When k is small, the model is complex and easy to fit.
When k is large, the model is simple
2.3 Classification decision Rules
Common: Majority voting rules (majority voting rule): 0-1 loss of experience under the function of risk minimization.
Hangyuan Li Statistical Learning Method--algorithm 2--k nearest neighbor method