The KNN (K Nearest neighbors,k nearest neighbor) algorithm is the simplest and best understood theory in all machine learning algorithms. KNN is an instance-based learning that calculates the distance between new data and the characteristic values of the training data, and then chooses K (k>=1) nearest neighbor to classify (vote) or return. If k=1, then the new data is simply assigned to its nearest neighbor class. is the KNN algorithm a supervised study or unsupervised learning? First, consider the definition of supervised learning and unsupervised learning. For supervised learning, the data has a clear label (classification for discrete distributions, regression for continuous distribution), and a model based on machine learning can divide new data into a definite class or get a predictive value. For unsupervised learning, the data does not have a label, and the model that the machine learns is the pattern extracted from the data (extracting deterministic features or clustering, etc.). Clustering, for example, is a model that the machine learns from the learning to determine which original data sets are "more like" the new data. KNN algorithm used for classification, each training data has a clear label, can also be clearly determined that the new data label,knn for the regression will also be based on the value of the neighbors to predict a definite value, so KNN belongs to supervised learning.
The calculation process of KNN algorithm:
- Select a distance calculation method to calculate the distance from the new data to the data points in a known category DataSet with all the characteristics of the data
- Sort in ascending order of distance, select K points with the least current distance
- For discrete classification, the category with the most frequency of K points is returned as a predictive classification, and the weighted value of K points is returned for regression.
From for notes (Wiz)
KNN algorithm--Birds of a feather, flock together