Implementation of KNN algorithm Python and simple digital recognition method

Source: Internet
Author: User
This paper describes the implementation of KNN algorithm Python and the method of simple digital recognition. Share to everyone for your reference. Specific as follows:

KNN algorithm algorithm Advantages and disadvantages:

Advantages: High accuracy, insensitive to outliers, no input data assumptions
Cons: Both time complexity and space complexity are high
Applicable data range: Numerical and nominal type

The idea of the algorithm:

KNN algorithm (full k nearest neighbor algorithm), the idea of the algorithm is very simple, simple is to say that a flock of birds, that is, we from a bunch of known training centers to find K and the goal of the closest, and then see the most of their classification is which, based on this as the basis for classification.

Function parsing:

Library functions:

Tile ()
such as tile (A,N) is to repeat a n times

The code is as follows:

A = Np.array ([0, 1, 2])
Np.tile (A, 2)
Array ([0, 1, 2, 0, 1, 2])
Np.tile (A, (2, 2))
Array ([[0, 1, 2, 0, 1, 2],[0, 1, 2, 0, 1, 2]])
Np.tile (A, (2, 1, 2))
Array ([[[[[0, 1, 2, 0, 1, 2]],[[0, 1, 2, 0, 1, 2]]]
b = Np.array ([[1, 2], [3, 4]])
Np.tile (b, 2)
Array ([[1, 2, 1, 2],[3, 4, 3, 4]])
Np.tile (b, (2, 1))
Array ([[1, 2],[3, 4],[1, 2],[3, 4]]) '


Self-fulfilling functions

CreateDataSet () generate test array
Knnclassify (INPUTX, DataSet, labels, k) Classification functions

INPUTX Input Parameters
DataSet Training Set
Marking of labels Training set
Number of K nearest neighbors

The code is as follows:


#coding =utf-8
From numpy Import *
Import operator

Def createdataset ():
Group = Array ([[1.0, 0.9], [1.0, 1.0], [0.1, 0.2], [0.0, 0.1]])
Labels = [' A ', ' a ', ' B ', ' B ']
Return Group,labels
#inputX表示输入向量 (that is, we have to judge which category it belongs to)
#dataSet表示训练样本
#label表示训练样本的标签
#k是最近邻的参数, choose the nearest K
def knnclassify (INPUTX, DataSet, labels, k):
Datasetsize = dataset.shape[0] #计算有几个训练数据
#开始计算欧几里得距离
Diffmat = Tile (INPUTX, (datasetsize,1))-DataSet

Sqdiffmat = Diffmat * * 2
Sqdistances = Sqdiffmat.sum (Axis=1) #矩阵每一行向量相加
distances = sqdistances * * 0.5
#欧几里得距离计算完毕
Sorteddistance = Distances.argsort ()
ClassCount = {}
For I in Xrange (k):
Votelabel = Labels[sorteddistance[i]]
Classcount[votelabel] = Classcount.get (votelabel,0) + 1
res = max (ClassCount)
return res

def main ():
Group,labels = CreateDataSet ()
t = knnclassify ([0,0],group,labels,3)
Print T

If __name__== ' __main__ ':
Main ()

KNN Application Example

The realization of handwriting recognition system

Data set:

Two data sets: Training and test. The label of the classification is in the file name. Pixel-32*32. The data is probably like this:

Method:

KNN use, but this distance is relatively complex (1024 characteristics), mainly to deal with how to read the data of the problem, the comparison of direct call on it can be.

Speed:

The speed is still relatively slow, here the data set is: Training 2000+,test 900+ (i5 CPU)

32s+ when you k=3.

The code is as follows:


#coding =utf-8
From numpy Import *
Import operator
Import OS
Import time

Def createdataset ():
Group = Array ([[1.0, 0.9], [1.0, 1.0], [0.1, 0.2], [0.0, 0.1]])
Labels = [' A ', ' a ', ' B ', ' B ']
Return Group,labels
#inputX表示输入向量 (that is, we have to judge which category it belongs to)
#dataSet表示训练样本
#label表示训练样本的标签
#k是最近邻的参数, choose the nearest K
def knnclassify (INPUTX, DataSet, labels, k):
Datasetsize = dataset.shape[0] #计算有几个训练数据
#开始计算欧几里得距离
Diffmat = Tile (INPUTX, (datasetsize,1))-DataSet
#diffMat = Inputx.repeat (datasetsize, Aixs=1)-DataSet
Sqdiffmat = Diffmat * * 2
Sqdistances = Sqdiffmat.sum (Axis=1) #矩阵每一行向量相加
distances = sqdistances * * 0.5
#欧几里得距离计算完毕
Sorteddistance = Distances.argsort ()
ClassCount = {}
For I in Xrange (k):
Votelabel = Labels[sorteddistance[i]]
Classcount[votelabel] = Classcount.get (votelabel,0) + 1
res = max (ClassCount)
return res

def img2vec (filename):
Returnvec = Zeros ((1,1024))
FR = open (filename)
For I in range (32):
Linestr = Fr.readline ()
For j in Range (32):
RETURNVEC[0,32*I+J] = Int (linestr[j])
Return Returnvec

def handwritingclasstest (trainingfloder,testfloder,k):
Hwlabels = []
Trainingfilelist = Os.listdir (Trainingfloder)
m = Len (trainingfilelist)
Trainingmat = Zeros ((m,1024))
For I in range (m):
FileName = Trainingfilelist[i]
Filestr = Filename.split ('. ') [0]
classnumstr = Int (Filestr.split ('_') [0])
Hwlabels.append (CLASSNUMSTR)
Trainingmat[i,:] = Img2vec (trainingfloder+ '/' +filename)
Testfilelist = Os.listdir (Testfloder)
Errorcount = 0.0
Mtest = Len (testfilelist)
For I in Range (mtest):
FileName = Testfilelist[i]
Filestr = Filename.split ('. ') [0]
classnumstr = Int (Filestr.split ('_') [0])
Vectorundertest = Img2vec (testfloder+ '/' +filename)
Classifierresult = Knnclassify (Vectorundertest, Trainingmat, Hwlabels, K)
#print Classifierresult, ", Classnumstr
If Classifierresult! = classnumstr:
Errorcount +=1
print ' tatal error ', Errorcount
print ' Error rate ', errorcount/mtest

def main ():
T1 = Time.clock ()
Handwritingclasstest (' trainingdigits ', ' testdigits ', 3)
T2 = Time.clock ()
print ' Execute ', t2-t1
If __name__== ' __main__ ':
Main ()

Hopefully this article will help you with Python programming.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.