International - English

Cart Console

Topic Center

Contact Sales

Home > Others

KNN in Data mining

Last Update:2015-09-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

K Nearest neighbor algorithm is a non-parametric method used frequently in classification problems. The algorithm is clear and concise: for the sample to be categorized, find its nearest K-sample (k in the training sample). The K-samples are then voted on, and the sample to be divided is consistent with the category of most samples.

There are two main problems in the algorithm: 1, what is the most recent evaluation? 2, in the end K equals how much?

For the first question, we have three different scenarios to discuss:

A. Nominal attribute: If the property value of the sample is the same, the distance of two samples is 0, otherwise 1. Example: There are two samples, one of which is gender, if the gender of two samples is male, the distance is 0, if one is male and female, the distance is 1.

B. Ordinal attributes: Consider a student's grade rating as below {poor,fair,ok,good,perfect}. We can do this by mapping each level to a sequential integer {poor=0,fair=1,ok=2,good=3,perfect=4} that starts at 0. How the scores of two students were good and fair, we can define distance distance=3-1=2.

C. Continuous properties: √∑ ((x-y) (x-y) can be measured with Euclidean distance. such as the distance between two points (3,4) distance =√ ((1-3) * (1-3) + (2-4) * (2-4)) =√8 = 2√2.

If a sample contains the above three attributes, we need to take the distance after normalization of each attribute. Or choose other algorithms such as decision trees, naive Bayes, etc.

for the second question, I think the better way is to test. Set up a confirmation sample set and then try to find out which K value is the better one. Of course, for large-scale data this method may not be very good, then the experience and judgment of engineers is particularly important. A lot of data suggest that K value between 3-10, experience shows that the K value can better control noise interference.

Features of the k nearest neighbor algorithm: A. There is no need to build a model (also known as a passive learning method), but the computational overhead is very high, and each time a sample is judged the distance from the sample to all training samples is calculated.

B. You can create boundaries of arbitrary shapes, whereas a decision tree algorithm can only produce linear boundaries.

C. The appropriate distance metrics are important.

This article is from "Lu Yao" blog, please be sure to keep this source http://cwxfly.blog.51cto.com/6113982/1690589

KNN in Data mining

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

KNN in Data mining

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support