Simple implementation of KNN algorithm

Source: Internet
Author: User

An algorithm principle: a set of training samples is known, where each training sample has its own tag (label), that is, we know the corresponding relationship between each sample data in the sample set and the owning category. After you enter new data that is not marked, compare each feature of the new data to the one that corresponds to the data in the sample set, and then extract the classification marks for the most similar data in the sample set. In general, we select the first k most similar data classification labels in the sample set, where the most frequently occurring categories are labeled as our new data classification. Simply put, the k_ nearest neighbor algorithm uses the distance method of measuring different eigenvalues to classify.

Algorithm advantages: High precision, insensitive to outliers, no data input assumptions.

Algorithm disadvantage: The computational time-space complexity is high due to the calculation of the corresponding feature distance between each data feature to be classified and each sample in the sample set.

Implementation of two algorithms (handwriting recognition)

1. Data preparation: Using 32*32 pixels of black and white images (0-9, about 200 samples per digit, trainingdigits for data classifier training, testdigits for testing), here to facilitate understanding, the image is converted into text format.

2. Code implementation:

Convert the picture to a vector, we convert a 32*32 binary image matrix into a 1*1024 vector, write a function vector2d, the following code

  1def vector2d (filename):  2     rows =  3     cols =   4     imgvector = Zeros ((1,rows * cols))  5     open(filename)  6 For     row in xrange (rows):  7         linestr = Filein.  ReadLine()  8for         col in Xrange (cols):  9             imgvector[0,row *32 + col] = Int (Linestr[col])      return imgvector 
View Code

Trainingdata set and testdata set loading

1‘‘‘LoadDataSet ""2 defLoaddataset ():3     Print‘.... Getting TrainingData‘4Datasetdir = ' d:/pythoncode/mlcode/knn/'5Trainingfilelist =OS. Listdir (Datasetdir + ' trainingdigits ')6NumSamples = Len (trainingfilelist)7 8train_x = Zeros ((numsamples,1024))9train_y = []TenFor I in Xrange (numsamples): Onefilename = Trainingfilelist[i] ATrain_x[i,:] = vector2d (datasetdir + ' trainingdigits/%s '%filename) -label = Int (filename.Split(‘_‘) [0]) -Train_y.append (label) the‘‘‘ .... Getting TestingData...‘‘‘ -     Print‘.... Getting TestingData...‘ -Testfilelist =OS. Listdir (Datasetdir + ' testdigits ') -NumSamples = Len (testfilelist) +test_x = Zeros ((numsamples,1024)) -Test_y = [] +For I in Xrange (numsamples): Afilename = Testfilelist[i] atTest_x[i,:] = vector2d (datasetdir + ' testdigits/%s '%filename) -label = Int (filename.Split(‘_‘) [0]) -Test_y.append (label) -  -     returnTrain_x,train_y,test_x,test_y
View Code

Construction of Classifiers

1From NumPyImport*2 3 Import OS4 5 defKnnclassify (newinput,dataset,labels,k):6NumSamples = dataset.shape[0]7 8diff = Tile (newinput, (numsamples,1))-DataSet9Squareddiff = diff * * 2Tensquareddist = SUM (Squareddiff,axis = 1) OneDistance = squareddist * * 0.5 A  -Sorteddistindex = Argsort (distance) -  theClassCount = {} -For I in Xrange (k): -Votedlabel = Labels[sorteddistindex[i]] -Classcount[votedlabel] = Classcount.get (votedlabel,0) + 1 +  -MaxValue = 0 +For Key,value in Classcount.items (): A         ifMaxValue < value: atMaxValue = value -Maxindex = key
View Code

Classification test

1 defTesthandwritingclass ():2     Print‘Load Data....‘3train_x,train_y,test_x,test_y = Loaddataset ()4     Print' Training ... '5 6     Print' Testing '7Numtestsamples = test_x.shape[0]8MatchCount = 0.09For I in Xrange (numtestsamples):Tenpredict = Knnclassify (test_x[i],train_x,train_y,3) One         ifPredict! = Test_y[i]: A  -             Print' The predict is ', predict, ' the target value is ', Test_y[i] -  the         ifpredict = = Test_y[i]: -MatchCount + = 1 -accuracy = float (matchcount)/numtestsamples -  +     Print' The accuracy is:%.2f%% '% (accuracy * 100)
View Code

Test results

1Testhandwritingclass ()2 Load Data....3.... Getting TrainingData4.... Getting TestingData...5Training ....6Testing7The predict is 7 the target value is 18The predict is 9 the target value is 39The predict is 9 the target value is 3TenThe predict is 3 the target value is 5 OneThe predict is 6 the target value is 5 AThe predict is 6 the target value is 8 -The predict is 3 the target value is 8 -The predict is 1 the target value is 8 theThe predict is 1 the target value is 8 -The predict is 1 the target value is 9 -The predict is 7 the target value is 9 -The Accuracy is:98.84%
View Code

Note: The above code running environment is Python2.7.11

From the above results can be seen the KNN classification effect is also good, in my opinion, KNN is simple rough, is the unknown classification of data characteristics and we classify the data characteristics of the comparison, select the most similar mark as their own classification, spicy problem came, if our new data characteristics in the sample set is relatively rare, At this point, the possibility of classification error is very large, on the contrary, if the sample concentration of a class of sample more, then the new data will be divided into the possibility of the class, how to ensure the fairness of the classification, we need to be weighted.

Data Source: http://download.csdn.net/download/qq_17046229/7625323

Simple implementation of KNN algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.