KNN classification of Data Mining

Source: Internet
Author: User

CategoryAlgorithmThere are many Bayesian, decision tree, support for Vector Product, KNN, etc., neural networks can also be used for classification. This articleArticleThis section describes KNN classification algorithms.

1. Introduction

KNN is short for K Nearest Neighbor. k is the nearest neighbor. K nearest neighbor is used to vote for the class label of the new instance. KNN is an instance-based learning algorithm. Unlike Bayesian and Decision Tree algorithms, KNN does not need to be trained. When a new instance appears, find K nearest instances in the training data set and assign the new instance to the class with the maximum number of instances in the K training instances. KNN also becomes a lazy learning. It does not need to be trained, and the classification accuracy is very high when the class label boundaries are neat. The KNN algorithm needs to manually determine the value of K, that is, to find several recent instances. Different K values may lead to different classification results.

2. Example

As shown in the distribution of the Training dataset, the dataset is divided into three types (represented in three different colors in the figure). Now a new instance (green dots in the figure) is displayed ), suppose we have k = 3, that is, we are looking for three closest instances. The distance defined here is the Euclidean distance, in this way, the last three instances of the instance to be classified are Circle centered on green points, and a minimum radius is determined so that the circle contains k points.

In the red circle, there are three in Category 2 and one in category 3, but none in Category 1. The vote is based on the principle that the minority follows the majority, the new green instance should belong to two categories.

3. Select the K value.

As mentioned before, the selection of K value will affect the classification result, so the value of K value is reasonable. We continue with the classification process mentioned above. Now we set K to 7, as shown in:

We can see that when k = 7, there are three in Class 1 in the last seven points, two in Class 2 and two in Class 3. In this case, the new green instance should be assigned to Class 1, this is different from the classification result when k = 5.

K value selection does not have an absolute standard, but it can be imagined that K is too large to improve the accuracy rate, and K nearest neighbor is an O (K * n) Complexity Algorithm, K is too large, and the algorithm efficiency will be lower.

Although the selection of K values will affect the results, some people will think that this algorithm is not stable. In fact, this effect is not very great, because only this effect has an impact on the category boundary, the effect on instances near the class center is very small. For such a new instance, K = 3, K = 5, and K = 11 are the same.

Finally, note that, when the dataset is not balanced, you may need to vote based on the proportion of each type, so that the accuracy rate of small classes will not be too low.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.