I. Overview
Nearest Neighbor Rule classification (k-nearest Neighbor) KNN algorithm
The initial proximity algorithm was proposed by cover and Hart in 1968,
This is a classification (classification) algorithm
Input instance-based learning (instance-based learning), lazy learning (lazy learning)
Second, the principle
in a sample data set, also known as the training sample set, and each data in the sample set has a label, that is, we know the corresponding relationship between each data in the sample set and the owning category. After entering new data without a label, each feature of the new data is compared to the feature in the sample set, and then the algorithm extracts the classification label of the most similar data (nearest neighbor) of the kk-< Span class= "FONTSTYLE0" > neighbor algorithm k K20k
Third, Distance
1, European distance
2. Cosine value (COS)
3. Correlation degree (correlation)
4. Manhattan Distance (Manhatten distance)
Iv. Advantages and Disadvantages
K-Nearest Neighbor algorithm is the simplest and most effective algorithm for classifying data, and K-nearest neighbor algorithm is an example-based learning, and the algorithm must have the training sample data close to the actual calculation. The k nearest neighbor algorithm must hold all data sets, and if the training data set is too large, a large amount of storage space must be used. In addition, because distance values must be calculated for each data in the dataset, it can be time-consuming to actually use it;
Another drawback of the K-nearest neighbor algorithm is that it cannot give any data infrastructure information, so we cannot know what the average instance sample and typical instance samples have. We will use probability measurement method to deal with classification problem;
The classification algorithm of ML's supervised learning algorithm one ———— K-Nearest neighbor algorithm (nearest neighbor algorithm)