KNN classification of Data Mining

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

CategoryAlgorithmThere are many Bayesian, decision tree, support for Vector Product, KNN, etc., neural networks can also be used for classification. This articleArticleThis section describes KNN classification algorithms.

1. Introduction

KNN is short for K Nearest Neighbor. k is the nearest neighbor. K nearest neighbor is used to vote for the class label of the new instance. KNN is an instance-based learning algorithm. Unlike Bayesian and Decision Tree algorithms, KNN does not need to be trained. When a new instance appears, find K nearest instances in the training data set and assign the new instance to the class with the maximum number of instances in the K training instances. KNN also becomes a lazy learning. It does not need to be trained, and the classification accuracy is very high when the class label boundaries are neat. The KNN algorithm needs to manually determine the value of K, that is, to find several recent instances. Different K values may lead to different classification results.

2. Example

As shown in the distribution of the Training dataset, the dataset is divided into three types (represented in three different colors in the figure). Now a new instance (green dots in the figure) is displayed ), suppose we have k = 3, that is, we are looking for three closest instances. The distance defined here is the Euclidean distance, in this way, the last three instances of the instance to be classified are Circle centered on green points, and a minimum radius is determined so that the circle contains k points.

In the red circle, there are three in Category 2 and one in category 3, but none in Category 1. The vote is based on the principle that the minority follows the majority, the new green instance should belong to two categories.

3. Select the K value.

As mentioned before, the selection of K value will affect the classification result, so the value of K value is reasonable. We continue with the classification process mentioned above. Now we set K to 7, as shown in:

We can see that when k = 7, there are three in Class 1 in the last seven points, two in Class 2 and two in Class 3. In this case, the new green instance should be assigned to Class 1, this is different from the classification result when k = 5.

K value selection does not have an absolute standard, but it can be imagined that K is too large to improve the accuracy rate, and K nearest neighbor is an O (K * n) Complexity Algorithm, K is too large, and the algorithm efficiency will be lower.

Although the selection of K values will affect the results, some people will think that this algorithm is not stable. In fact, this effect is not very great, because only this effect has an impact on the category boundary, the effect on instances near the class center is very small. For such a new instance, K = 3, K = 5, and K = 11 are the same.

Finally, note that, when the dataset is not balanced, you may need to vote based on the proportion of each type, so that the accuracy rate of small classes will not be too low.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

KNN classification of Data Mining

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

KNN classification of Data Mining

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support