K Nearest Neighbor Algorithm-KNN

Source: Internet
Author: User

What is the k nearest neighbor algorithm , namely K-nearest Neighbor algorithm, short of the KNN algorithm, single from the name to guess, can be simple and rough think is: K nearest neighbour, when K=1, the algorithm becomes the nearest neighbor algorithm, that is to find the closest neighbor. Why are you looking for a neighbor? For example, suppose you come to a strange village, and now you have to find people with similar characteristics to you to integrate them, so-called occupation.
In the official words, the so-called K-nearest neighbor algorithm, that is, given a training data set, the new input instance, in the training data set to find the nearest neighbor of the K-instance (that is, the K neighbors above), the K-instance of the majority of a class, the input instance is classified into this class. According to this, let's look at a picture quoted on Wikipedia:


As shown, there are two different kinds of sample data, respectively, with a small blue square and a small red triangle, and the figure in the middle of the green circle marked by the data is to be classified. That is, now, we do not know the middle of the green data is from which category (blue small square or red small triangle), below, we will solve this problem: to the green Circle classification.
• If k=3, the nearest 3 neighbors of the Green Dot is 2 red small triangles and a small blue square, the few subordinate to the majority, based on the statistical method, the green of this to classify point belongs to the Red triangle category.
• If k=5, the nearest 5 neighbors of the green dot are 2 red triangles and 3 blue squares, or a few subordinate to the majority, based on the statistical method, the green of this to classify point belongs to the Blue Square category.
As we can see, when we cannot determine which of the current classification points is from the category of the known classification, we could look at the position characteristics of the data according to the theory of statistics, measure the weight of its neighbors, and classify it as (or allocate) to the larger weight. This is the core idea of K-nearest neighbor algorithm.

Selection of K-values


1. If you choose a smaller k value, which is equivalent to using a training instance in a smaller field to predict, the "learning" approximation error will decrease, and only training instances that are closer to or similar to the input instance will work on the prediction results, while the problem is that the "learning" estimate error will increase, in other words, The decrease of K value means that the whole model becomes complex and easy to fit;
2. If a large k value is chosen, it is equivalent to using a training example in a larger field to predict, the advantage is that it can reduce the estimation error of learning, but the disadvantage is that the approximate error of learning will increase. At this point, the training instance, which is far away from the input instance, also acts on the Predictor, making the prediction error, and the increase in the K value means that the overall model becomes simple.
3.k=n is completely unworthy, because no matter what the input instance is, it simply predicts that it belongs to the most tired in the training instance, the model is too simple, and ignores a lot of useful information in the training instance.
In practical applications, K values generally take a relatively small value, for example, the use of cross-validation (in short, is part of the sample training set, part of the test set) to select the best K value.

K Nearest Neighbor Algorithm-KNN

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.