Near the study of "machine learning Combat" This book, made some notes, and everyone to share the following:
An overview of the K-Nearest neighbor algorithm (KNN)
The simplest initial-level classifier is a record of all the classes corresponding to the training data, which can be categorized when the properties of the test object and the properties of a training object match exactly. But how is it possible that all the test objects will find the exact match of the training object, followed by the existence of a test object at the same time with more than one training object, resulting in a training object is divided into multiple classes of the problem, based on these problems, resulting in KNN.
KNN is classified by measuring the distance between different eigenvalues. The idea is that if a sample is the most similar in the K in the feature space (that is, the nearest neighbor in the feature space), the sample belongs to that category. K is usually an integer that is not greater than 20. In the KNN algorithm, the selected neighbors are the objects that have been correctly categorized. This method determines the category to which the sample is to be divided based on the category of the nearest one or several samples in the categorical decision-making.
The following is a simple example of how a green circle is to be determined by which class, is it a red triangle or a blue quad? If k=3, because the red triangle is the proportion of 2/3, the green circle will be given the red triangle that class, if k=5, because the blue four-square scale is 3/5, so the green circle is given the blue four-square class.
Second, the algorithm pseudo-code:
Algorithm for searching K nearest neighbors: KNN (a[n],k)
Input: A[n] is the coordinate of n training sample in space, K is the nearest neighbor number
Output: Category X belongs to
Take A[1]~a[k] as the initial neighbor of X, calculate the Euclidean distance D (X,a[i]) between the test sample X, i=1,2,....., K; by D (X,a[i]) ascending order, calculate the distance between the farthest sample and x d<-----max{d (x,a[j]) | j= ,....., k};
for (i=k+1;i<=n;i++)
Calculates the distance between A[i] and X D (X,a[i]);
if (d (x,a[i)) <d
Then use A[i] instead of the farthest sample
In ascending order of D (X,a[i]), calculate the distance between the farthest sample and x d<---max{d (x,a[j]) | j=1,..., i}; calculate the first k samples a[i]), i=1,2,..., The probability of the category of K, which is the class of the sample X with the maximum probability
Three, the algorithm summary
The K-Nearest neighbor algorithm is the simplest and most effective algorithm for classifying data, which is based on instance learning and must have a set of training samples close to actual data (which can reflect actual data) when using the algorithm. All training data must be saved using the K-nearest neighbor algorithm, if the data is large and occupies a large space. Also, for a sample classification, the distance from the samples in all the training samples must be calculated, which can be time consuming. The improved algorithm for K-nearest neighbor is decision tree.
The K-Nearest neighbor algorithm for machine learning