Concept, error rate and problems of KNN classification in "machine learning detailed"

Source: Internet
Author: User

Reprint Please specify the source : http://blog.csdn.net/luoshixian099/article/details/50923056

Do not build a plateau on a floating sand

KNN Concept

KNN (k-nearest neighbors algorithm) is a non-parametric model algorithm. In the sample point where the training data is n, look for the K samples of the nearest neighbor test data x, and then count the number of the K samples to be entered under each category w_i k_i, select the maximum k_i category w_i as the return value of the test data x. When K=1, called the nearest neighbor algorithm, in sample data D, find a sample of the nearest neighbor X, and the X is classified under this sample category. Common distances are measured as European distances.

Algorithm Flow:



The image on the left shows: in the two-dimensional plane to predict the middle ' * ' color, when using k=11, which has 4 black, 7 blue, that is, predicted ' * ' is blue.

as shown in the image on the right: when K=1, the nearest neighbor algorithm, is equivalent to dividing the space into N regions, each of which determines an area. The points in each area belong to the category of the sample, because the area's data points are closest to the sample compared to the samples used, and this algorithm is also known as Voronoi tessellation.


--------------------------------------------------------------------------------------------------------------- -----------------------------

The following four pairs of images are in a two-dimensional plane, the data point category is 3 classes, using k=10. Figure (a) is a sample data point; figure (b) is a probabilistic thermal image of Y=1 (corresponding '+') in each position on the plane, and the image (c) corresponds to the y=2 (corresponding to the '*') of the class of heat; figure (d) Use the map estimate (the category of the maximum probability) to the category of each point of the plane.

--------------------------------------------------------------------------------------------------------------- ---

KNN algorithm Error rate

Assuming that the optimal Bayesian classification rate is recorded as p_b, the error rate of KNN algorithm is proved according to relevant papers:

When the data sample volume n tends to infinity, k=1: , M is the total number of data categories

When the data sample volume n tends to infinity, m=2:;

It is shown from the formula that KNN algorithm is better than 1-NN algorithm because it reduces the lower bound of error. And with the increase of K, P_KNN asymptotic to the optimal error rate p_b; in fact, when k->∞ (but still accounted for a small portion of the sample total N), the KNN algorithm is approaching the Bayesian classifier accurately.


the problem of KNN algorithm
    • when the amount of data n is large, and the data dimension D is high, the search efficiency drops sharply. If the method of violence is used, the complexity is . To increase the efficiency, we can use KD tree algorithm optimization, see:kd tree and BBF algorithm analysis
    • sometimes, depending on the actual situation, you need to reduce the number of samples, you can use prototype editing or condensing algorithm, etc. theprototype editing algorithm uses its own data sample as a test sample, and the KNN algorithm is used to reject the sample if the classification is wrong.
    • When the sample total n is very small, it causes the error rate to rise. One solution is to train the metric distance method, which uses different measures for different samples to reduce the error rate, which can be divided into: Global Method (Globals), in-class method (Class-dependent), local method (Locally-dependent).


Ref:machine learning:a Probabilistic Perspective

Pattern recognition,4th.

Concept, error rate and problems of KNN classification in "machine learning detailed"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.