In the official words, the so-called K-Nearest neighbor algorithm (K-nearest NEIGHBOR,KNN), that is, given a training data set, the new input instance, in the training data set to find the nearest neighbor of the K-instance (that is, the K neighbors above),
The majority of the K instances belong to a class, and the input instance is categorized into this class. This is a supervised learning algorithm.
For example, red and blue represent known well-trained data, this time to an example, that is, the green circle in the figure, which category is the Green circle block?
- If k=3, the nearest 3 neighbors of the Green Dot is 2 red small triangles and a small blue square, the few subordinate to the majority, based on the statistic method, determines that the green of this classification point belongs to the Red triangle category.
- If k=5, the nearest 5 neighbors of the Green dot is 2 red triangles and 3 blue squares, or a few subordinate to the majority, based on the statistical method, the green of this to classify point belongs to the Blue Square category.
In the KNN algorithm, the selected neighbors are the objects that have been correctly categorized. This method determines the category to which the sample is to be divided based on the category of the nearest one or several samples in the categorical decision-making. KNN algorithm itself is simple and effective, it is a lazy-learning algorithm, classifier does not need to use training set for training, training time complexity of 0. The computational complexity of the KNN classification is proportional to the number of documents in the training set, i.e., if the total number of documents in the training set is N, then the classification time complexity of KNN is O (n).
Algorithm Entry series 2:k nearest neighbor algorithm