International - English

Cart Console

Topic Center

Contact Sales

Home > Others

KNN algorithm and R language implementation (1)

Last Update:2014-11-21 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Key points of the algorithm:

KNN (K-nearst neighbor)

1:k:= nearest neighbor Point, d:=training set of data

2:for (point z to be measured)

3: Calculate the distance between z and each sample (x, y)

4: Select a collection of K training samples nearest to Z

5: Count The 4th step to get the point of what kind of more, then Z belongs to which category

6:end for

Data:

Libraryi (ISLR)

Names (Smarket) #R自带数据

KNN Code:

Attach (Smarket)

Train= (year<2005) #分离出2005之前的数据

Train. X=cbind (LAG1,LAG2) [Train,] #把2005年之前的数据中Lag1, Lag2 composition matrix, as training data

Test. X=cbind (LAG1,LAG2) [!train,] #把2005年之后的数据中Lag1, Lag2 composition matrix, as test data

Train. Direction=direction[train] #训练数据的类标号

Direction.2005=direction[!train]

Set.seed (1) #set. Seed (k) results can reproduce K-times

KNN.PRED=KNN (train. X,test. X,train. DIRECTION,K=3)

Table (knn.pred,direction.2005) #confusion matrix

Mean (knn.pred==direction.2005) #accurate rate

Experimental results

direction.2005
Knn.pred down
Down 48 55
UP 63 86
> Mean (knn.pred==direction.2005)
[1] 0.531746
Algorithm Analysis:

Advantages: (i) instance-based learning, which does not need to establish a model, does not have to maintain an abstraction (model) originating from the data, (ii) can generate decision boundaries of arbitrary shapes, and decision trees and rule-based classifiers are confined to line decision boundaries (high dimensional is hyper-planar)

Disadvantage: (i) The classification test sample, the cost is very large, O (n), n is the number of training set, (ii) based on local information to predict, so when the K-hour, the noise data is very sensitive, and the model classification algorithm is based on the overall.

Note: (i) It is important to choose what kind of proximity measurement and data processing, such as (height, weight) to classify people, height variability may be very small (0.2-2.5m), and weight variable range is larger (5-150kg), if the attribute unit is not considered, then the proximity may be only about weight. (ii) K is too small, then the first point around Z determines the class Z, because there is no sufficient reference to training set information, so it is underfitting, if K is too large, an extreme example is k>n, then all the points belong to the majority category.

KNN algorithm and R language implementation (1)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

KNN algorithm and R language implementation (1)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support