Simple implementation of KNN algorithm

Last Update:2016-05-08 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

An algorithm principle: a set of training samples is known, where each training sample has its own tag (label), that is, we know the corresponding relationship between each sample data in the sample set and the owning category. After you enter new data that is not marked, compare each feature of the new data to the one that corresponds to the data in the sample set, and then extract the classification marks for the most similar data in the sample set. In general, we select the first k most similar data classification labels in the sample set, where the most frequently occurring categories are labeled as our new data classification. Simply put, the k_ nearest neighbor algorithm uses the distance method of measuring different eigenvalues to classify.

Algorithm advantages: High precision, insensitive to outliers, no data input assumptions.

Algorithm disadvantage: The computational time-space complexity is high due to the calculation of the corresponding feature distance between each data feature to be classified and each sample in the sample set.

Implementation of two algorithms (handwriting recognition)

1. Data preparation: Using 32*32 pixels of black and white images (0-9, about 200 samples per digit, trainingdigits for data classifier training, testdigits for testing), here to facilitate understanding, the image is converted into text format.

2. Code implementation:

Convert the picture to a vector, we convert a 32*32 binary image matrix into a 1*1024 vector, write a function vector2d, the following code

  1def vector2d (filename):  2     rows =  3     cols =   4     imgvector = Zeros ((1,rows * cols))  5     open(filename)  6 For     row in xrange (rows):  7         linestr = Filein.  ReadLine()  8for         col in Xrange (cols):  9             imgvector[0,row *32 + col] = Int (Linestr[col])      return imgvector

View Code

Trainingdata set and testdata set loading

1‘‘‘LoadDataSet ""2 defLoaddataset ():3     Print‘.... Getting TrainingData‘4Datasetdir = ' d:/pythoncode/mlcode/knn/'5Trainingfilelist =OS. Listdir (Datasetdir + ' trainingdigits ')6NumSamples = Len (trainingfilelist)7 8train_x = Zeros ((numsamples,1024))9train_y = []TenFor I in Xrange (numsamples): Onefilename = Trainingfilelist[i] ATrain_x[i,:] = vector2d (datasetdir + ' trainingdigits/%s '%filename) -label = Int (filename.Split(‘_‘) [0]) -Train_y.append (label) the‘‘‘ .... Getting TestingData...‘‘‘ -     Print‘.... Getting TestingData...‘ -Testfilelist =OS. Listdir (Datasetdir + ' testdigits ') -NumSamples = Len (testfilelist) +test_x = Zeros ((numsamples,1024)) -Test_y = [] +For I in Xrange (numsamples): Afilename = Testfilelist[i] atTest_x[i,:] = vector2d (datasetdir + ' testdigits/%s '%filename) -label = Int (filename.Split(‘_‘) [0]) -Test_y.append (label) -  -     returnTrain_x,train_y,test_x,test_y

View Code

Construction of Classifiers

1From NumPyImport*2 3 Import OS4 5 defKnnclassify (newinput,dataset,labels,k):6NumSamples = dataset.shape[0]7 8diff = Tile (newinput, (numsamples,1))-DataSet9Squareddiff = diff * * 2Tensquareddist = SUM (Squareddiff,axis = 1) OneDistance = squareddist * * 0.5 A  -Sorteddistindex = Argsort (distance) -  theClassCount = {} -For I in Xrange (k): -Votedlabel = Labels[sorteddistindex[i]] -Classcount[votedlabel] = Classcount.get (votedlabel,0) + 1 +  -MaxValue = 0 +For Key,value in Classcount.items (): A         ifMaxValue < value: atMaxValue = value -Maxindex = key

View Code

Classification test

1 defTesthandwritingclass ():2     Print‘Load Data....‘3train_x,train_y,test_x,test_y = Loaddataset ()4     Print' Training ... '5 6     Print' Testing '7Numtestsamples = test_x.shape[0]8MatchCount = 0.09For I in Xrange (numtestsamples):Tenpredict = Knnclassify (test_x[i],train_x,train_y,3) One         ifPredict! = Test_y[i]: A  -             Print' The predict is ', predict, ' the target value is ', Test_y[i] -  the         ifpredict = = Test_y[i]: -MatchCount + = 1 -accuracy = float (matchcount)/numtestsamples -  +     Print' The accuracy is:%.2f%% '% (accuracy * 100)

View Code

Test results

1Testhandwritingclass ()2 Load Data....3.... Getting TrainingData4.... Getting TestingData...5Training ....6Testing7The predict is 7 the target value is 18The predict is 9 the target value is 39The predict is 9 the target value is 3TenThe predict is 3 the target value is 5 OneThe predict is 6 the target value is 5 AThe predict is 6 the target value is 8 -The predict is 3 the target value is 8 -The predict is 1 the target value is 8 theThe predict is 1 the target value is 8 -The predict is 1 the target value is 9 -The predict is 7 the target value is 9 -The Accuracy is:98.84%

View Code

Note: The above code running environment is Python2.7.11

From the above results can be seen the KNN classification effect is also good, in my opinion, KNN is simple rough, is the unknown classification of data characteristics and we classify the data characteristics of the comparison, select the most similar mark as their own classification, spicy problem came, if our new data characteristics in the sample set is relatively rare, At this point, the possibility of classification error is very large, on the contrary, if the sample concentration of a class of sample more, then the new data will be divided into the possibility of the class, how to ensure the fairness of the classification, we need to be weighted.

Data Source: http://download.csdn.net/download/qq_17046229/7625323

Simple implementation of KNN algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Simple implementation of KNN algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Simple implementation of KNN algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support