KNN Classification Algorithm Supplement

Source: Internet
Author: User
Tags square root

KNN Supplement:

1, the K value is set to how big?

K is too small, the classification results are susceptible to noise points,K is too large, the nearest neighbor may contain too many other categories of points.

(for distance weighting, the effect of K-value setting can be reduced)

The k value is usually determined by cross-examination ( k=1 as the benchmark)

rule of thumb:K is generally lower than the square root of the number of training samples

2, how to determine the most appropriate category?

The weighted voting method is more appropriate. And how to weighting, need to be based on specific business and data characteristics to explore

3, how to choose the right distance measurement?

The impact of high dimensions on distance measurement: It is well known that the more the number of variables, the more the Euclidean distance is less discriminating.

The effect of variable range on distance: The variable with the larger range is often dominated by the distance calculation, so the variables should be normalized first.

4. Should training samples be treated equally?

In the training set, some samples may be more worthy of reliance.

It can also be said that the quality of the sample data problem

Different weights can be applied to various samples to enhance the weight of dependent samples and reduce the impact of unreliable samples .

5, performance problems?

KNN is a lazy algorithm , usually do not study hard, test (the test sample classification) only cramming (temporarily to find K nearest neighbor).

The consequence of laziness: the construction model is simple, but the system overhead of classifying the test samples is large, because all training samples are scanned and distances are computed.

There are a number of ways to improve the efficiency of calculations, such as compressing training samples.

KNN Classification Algorithm Supplement

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.