This content is from the public Platform: machine learning window
and http://www.cnblogs.com/kaituorensheng/p/3579347.html
In the field of pattern recognition, the nearest neighbor method (KNN algorithm and K-nearest neighbor algorithm) is the method to classify the closest training samples in the feature space. The nearest neighbor method uses the vector space model to classify, the concept is the same category of cases, the similarity between each other is high, and can be calculated with a known category of cases of similarity , to assess the possible classification of unknown categories of cases .
K-nn is an instance -based learning, or a partial approximation and lazy learning that defers all calculations to the classification.
The K-Nearest neighbor algorithm is one of the simplest of all machine learning algorithms: The assigned object is listed as the most common category of its neighborhood objects (k is a positive integer, usually very small). If k=1, then the object is simply assigned to its neighbor's class. The same method can be used for regression, such as simply assigning an object's property value as the average of its K-nearest property value. It can effectively measure the weight of neighbors, so that the weight of the neighboring neighbor is far more important than the neighbor's power. (A common weighting scheme is to assign a value of 1/d to each neighbor, where D is the distance to the neighbor.) This scheme is a generalization of linear interpolation. The neighborhood is taken from a set of objects that have been correctly categorized (in the case of regression, where the value of the property is correct) . Although no explicit training steps are required, this can also be used as an algorithm for training sample sets. The K-Nearest neighbor algorithm is very sensitive to the local structure of the data. The nearest neighbor algorithm can calculate the decision boundary accurately in an efficient way.
Target : Classify an unknown class instance
Input : Unknown category instance item to classify, known class instance collection, containing instances of fixed known categories
Output : Possible categories of instances
The specific analysis is as follows:
K-Nearest Neighbor method (k nearest Neighbor Algorithm,k-nn) is the most basic classification algorithm in machine learning, in which the K nearest neighbor instances are found in the training data set , and the categories are the most examples of the k nearest neighbors . Category to determine when k=1, that is, the category of the nearest neighbor instance.
As shown (from the wiki), when k=3, when the number of red is 2, the green input instance of the category is a red triangle, when k=5, when the number of Blue is 3, the input instance of the category is a blue quadrilateral.
In the classification process, the K value is usually a man-made pre-defined constant value, as can be seen, the selection of K value will have a lot of effect on the results. The large k value can reduce the effect of noise on the classification, but the real class which is farther from the example will play a role in the decision, and the best K value is usually chosen by cross-validation .
The nearest neighbor method (KNN algorithm) for machine learning specific algorithm series