International - English

Cart Console

Topic Center

Contact Sales

Home > Others

KNN of 20151014_ based on distance classification algorithm

Last Update:2015-10-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Principle

　　By calculating the distance from each training data to the tuple to be categorized, the nearest K training data is taken and the tuple to be categorized, and which category of training data in the K data is the majority, which category the classifier belongs to.

Training samples are described with n-dimensional numeric attributes. Each sample represents a point in an n-dimensional space. All training samples are placed in the n-dimensional mode space. Given a sample, K-nearest to the taxonomy search pattern space, to find the nearest unknown sample of the nine training samples.2. Required Information

Training set
Distance calculated value
The number of nearest neighbors to get K

Calculate the distance between two points
1. For example, Euclidean distances can be used: D = sqrt ((x1-x2) ^2+ (y1-y2) ^2+...+ (Yn-yn) ^2)
Determine the results of a classification from the nearest neighbor list
1. method One : Select the class label for most of the K nearest neighbors
2. method Two : You can add weights to each poll based on distance Weight factor, w=1/d2
Selection of K values

If k is too small, it will be too sensitive to the noise present in the data;

If k is too large, the neighbors may contain points of other classes;

An empirical rule of thumb is k≤,q for the number of training tuples. Business algorithms typically use 10 as the default value.

3. General description

Algorithm: K-nearest Neighbor Classification algorithm input: training data t; nearest neighbor number k; tuple t to be categorized. Output: Output category C. (1) n=?;//define a neighbor set（2) for each d∈t does BEGIN (3) IF | N|≤k Then//the size of n is maintained at K（4) n=n∪{d}; (5) ELSE (6IF u∈n such that sim (T,u) <Sim (t,d) then BEGIN          /*If there is a data u in N, the similarity between T and U is less than the similarity between T and D (no equal to the condition guaranteed u! =d), i.e.: Newly added D can be added to n after n minus U, and D is a new member of N*/(7) N=n-{u};//Remove U（8) N=n∪{d};//Add D（9) END (Ten) END ( One) c=classTo which the most u∈n.

4. KNN Advantages and disadvantages

　　Advantages : The principle is simple, the realization is more convenient. Supports incremental learning. Can model the complex decision space of hyper-polygon.

　　cons : High computational overhead requires efficient storage technology and support for parallel hardware.

5.Java implementations

refer to: Java implementation of K nearest neighbor (KNN) algorithm

KNN of 20151014_ based on distance classification algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

KNN of 20151014_ based on distance classification algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support